Voice AI Intelligence Index

A pure-quality leaderboard of speech-to-text systems on the BIGOS benchmark. The headline score reflects accuracy and robustness only — cost and speed are shown as separate trade-off axes, mirroring the Artificial Analysis Intelligence Index. Free public research showcase; not a commercial offer.

Weighting profile

Balanced

Pure quality composite. Cost and speed are shown as separate trade-off axes, not folded into the score.

Voice AI Intelligence Index

52 systems · BIGOS v3 · generated 2026-06-16

#SystemQualityCost/min
1Elevenlabs Scribe V294.2$0.0067
2Lindat Uwebasr92.4 *
3Amazon Transcribe90.1$0.0240
4Google STT V2 Chirp 289.4$0.0160
5Soniox STT-Async-V486.0$0.0017
6Elevenlabs Scribe V184.9$0.0067
7Google STT V2 Chirp 383.2$0.0160
8Soniox STT-Async-V383.1$0.0017
9Whisper Cloud Whisper-182.4$0.0060
10Azure Gpt 4o Mini Transcribe81.0$0.0030
11Google STT V2 Chirp79.4$0.0160
12Whisper Local Large-V377.7
13Whisper Local Large76.5
14Azure STT Standard70.3$0.0170
15Nemo Parakeet Tdt 0.6b V369.7
16Speechmatics Standard69.2$0.0040
17Azure Whisper68.3$0.0060
18Assemblyai Best68.2$0.0045
19Assemblyai Nano67.7$0.0045
20Openai Gpt4o Transcribe66.5$0.0060
21Nemo Canary 1b V263.9
22Whisper Local Large-V263.3
23Speechmatics Enhanced57.2$0.0040
24Distil Whisper Distil-Large-V3-Pl51.6
25Whisper Local Turbo50.3
26Omnilingual ASR Llm-7b49.1
27Azure Gpt 4o Transcribe Diarize48.6$0.0060
28Omnilingual ASR Llm-1b43.7
29Omnilingual ASR Llm-3b43.4
30Whisper Local Medium40.6
31Omnilingual ASR Llm-300m34.7
32Mms 1b-All32.1
33Whisper Finetuned Bardsai-Large-V2-Pl-V230.4
34Mms 1b-L110730.4
35Omnilingual ASR Ctc-7b29.0
36Whisper Local Small28.6
37Nemo STT Pl Fastconformer Hybrid Large Pc27.1
38Deepgram Nova-225.3$0.0043
39Whisper Finetuned Thestageai-Large-V3-Turbo25.3
40Omnilingual ASR Ctc-3b25.0
41Omnilingual ASR Ctc-1b23.9
42Deepgram Nova-322.7$0.0043
43Nemo STT Multilingual Fastconformer Hybrid Large Pc19.5
44Google STT V1 Command And Search18.3$0.0240
45Google STT V117.9$0.0240
46Whisper Local Base16.8
47Nemo STT Pl Quartznet15x515.4
48Mms 1b-Fl10214.2
49Omnilingual ASR Ctc-300m13.3
50Whisper Local Tiny10.6
51Nemo STT En Fastconformer Hybrid Large Pc3.9
52Distil Whisper Distil-Large-V30.0

* cost GPU-imputed for local providers (no public per-minute price).

Quality × Cost

Upper-left is the sweet spot: high quality, low cost per minute.

Quality × Speed

Upper-right is the sweet spot: high quality, fast throughput.

Hardware caveat

Per-task efficiency metrics (ADR-061). cost_per_minute_usd may be imputed for local providers (see has_imputed_cost). time_per_minute_sec = 60 / RTFx is HARDWARE-DEPENDENT and indicative only — not cross-comparable as a latency SLA.

RTFx measured on Darwin-x86_64. Time per minute is hardware-dependent and not a cross-comparable latency SLA.

Voice AI Intelligence Index — TrustQuest.AI