A pure-quality leaderboard of speech-to-text systems on the BIGOS benchmark. The headline score reflects accuracy and robustness only — cost and speed are shown as separate trade-off axes, mirroring the Artificial Analysis Intelligence Index. Free public research showcase; not a commercial offer.
Weighting profile
Balanced
Pure quality composite. Cost and speed are shown as separate trade-off axes, not folded into the score.
52 systems · BIGOS v3 · generated 2026-06-16
| # | System | Quality | Cost/min |
|---|---|---|---|
| 1 | Elevenlabs Scribe V2 | 94.2 | $0.0067 |
| 2 | Lindat Uwebasr | 92.4 | — * |
| 3 | Amazon Transcribe | 90.1 | $0.0240 |
| 4 | Google STT V2 Chirp 2 | 89.4 | $0.0160 |
| 5 | Soniox STT-Async-V4 | 86.0 | $0.0017 |
| 6 | Elevenlabs Scribe V1 | 84.9 | $0.0067 |
| 7 | Google STT V2 Chirp 3 | 83.2 | $0.0160 |
| 8 | Soniox STT-Async-V3 | 83.1 | $0.0017 |
| 9 | Whisper Cloud Whisper-1 | 82.4 | $0.0060 |
| 10 | Azure Gpt 4o Mini Transcribe | 81.0 | $0.0030 |
| 11 | Google STT V2 Chirp | 79.4 | $0.0160 |
| 12 | Whisper Local Large-V3 | 77.7 | — |
| 13 | Whisper Local Large | 76.5 | — |
| 14 | Azure STT Standard | 70.3 | $0.0170 |
| 15 | Nemo Parakeet Tdt 0.6b V3 | 69.7 | — |
| 16 | Speechmatics Standard | 69.2 | $0.0040 |
| 17 | Azure Whisper | 68.3 | $0.0060 |
| 18 | Assemblyai Best | 68.2 | $0.0045 |
| 19 | Assemblyai Nano | 67.7 | $0.0045 |
| 20 | Openai Gpt4o Transcribe | 66.5 | $0.0060 |
| 21 | Nemo Canary 1b V2 | 63.9 | — |
| 22 | Whisper Local Large-V2 | 63.3 | — |
| 23 | Speechmatics Enhanced | 57.2 | $0.0040 |
| 24 | Distil Whisper Distil-Large-V3-Pl | 51.6 | — |
| 25 | Whisper Local Turbo | 50.3 | — |
| 26 | Omnilingual ASR Llm-7b | 49.1 | — |
| 27 | Azure Gpt 4o Transcribe Diarize | 48.6 | $0.0060 |
| 28 | Omnilingual ASR Llm-1b | 43.7 | — |
| 29 | Omnilingual ASR Llm-3b | 43.4 | — |
| 30 | Whisper Local Medium | 40.6 | — |
| 31 | Omnilingual ASR Llm-300m | 34.7 | — |
| 32 | Mms 1b-All | 32.1 | — |
| 33 | Whisper Finetuned Bardsai-Large-V2-Pl-V2 | 30.4 | — |
| 34 | Mms 1b-L1107 | 30.4 | — |
| 35 | Omnilingual ASR Ctc-7b | 29.0 | — |
| 36 | Whisper Local Small | 28.6 | — |
| 37 | Nemo STT Pl Fastconformer Hybrid Large Pc | 27.1 | — |
| 38 | Deepgram Nova-2 | 25.3 | $0.0043 |
| 39 | Whisper Finetuned Thestageai-Large-V3-Turbo | 25.3 | — |
| 40 | Omnilingual ASR Ctc-3b | 25.0 | — |
| 41 | Omnilingual ASR Ctc-1b | 23.9 | — |
| 42 | Deepgram Nova-3 | 22.7 | $0.0043 |
| 43 | Nemo STT Multilingual Fastconformer Hybrid Large Pc | 19.5 | — |
| 44 | Google STT V1 Command And Search | 18.3 | $0.0240 |
| 45 | Google STT V1 | 17.9 | $0.0240 |
| 46 | Whisper Local Base | 16.8 | — |
| 47 | Nemo STT Pl Quartznet15x5 | 15.4 | — |
| 48 | Mms 1b-Fl102 | 14.2 | — |
| 49 | Omnilingual ASR Ctc-300m | 13.3 | — |
| 50 | Whisper Local Tiny | 10.6 | — |
| 51 | Nemo STT En Fastconformer Hybrid Large Pc | 3.9 | — |
| 52 | Distil Whisper Distil-Large-V3 | 0.0 | — |
* cost GPU-imputed for local providers (no public per-minute price).
Upper-left is the sweet spot: high quality, low cost per minute.
Upper-right is the sweet spot: high quality, fast throughput.
Hardware caveat
Per-task efficiency metrics (ADR-061). cost_per_minute_usd may be imputed for local providers (see has_imputed_cost). time_per_minute_sec = 60 / RTFx is HARDWARE-DEPENDENT and indicative only — not cross-comparable as a latency SLA.
RTFx measured on Darwin-x86_64. Time per minute is hardware-dependent and not a cross-comparable latency SLA.