Voice AI Intelligence Index

A pure-quality leaderboard of speech-to-text systems on the BIGOS benchmark. The headline score reflects accuracy and robustness only — cost and speed are shown as separate trade-off axes, mirroring the Artificial Analysis Intelligence Index. Free public research showcase; not a commercial offer.

Weighting profile

Balanced

Pure quality composite. Cost and speed are shown as separate trade-off axes, not folded into the score.

Voice AI Intelligence Index

52 systems · BIGOS v3 · generated 2026-06-16

#	System	Quality	95% CI	Cost/min	Time/min	Datasets
1	Elevenlabs Scribe V2	94.2	93.6–94.7	$0.0067	8.3s	2/3
2	Lindat Uwebasr	92.4	92.0–92.8	— *	9.0s	2/3
3	Amazon Transcribe	90.1	89.7–90.5	$0.0240	58.8s	2/3
4	Google STT V2 Chirp 2	89.4	89.0–89.8	$0.0160	33.0s	2/3
5	Soniox STT-Async-V4	86.0	85.5–86.5	$0.0017	36.8s	2/3
6	Elevenlabs Scribe V1	84.9	84.4–85.4	$0.0067	9.7s	2/3
7	Google STT V2 Chirp 3	83.2	82.8–83.6	$0.0160	32.8s	2/3
8	Soniox STT-Async-V3	83.1	82.6–83.6	$0.0017	37.0s	2/3
9	Whisper Cloud Whisper-1	82.4	82.2–82.7	$0.0060	1.5s	2/3
10	Azure Gpt 4o Mini Transcribe	81.0	80.5–81.4	$0.0030	6.4s	2/3
11	Google STT V2 Chirp	79.4	79.0–79.9	$0.0160	51.3s	2/3
12	Whisper Local Large-V3	77.7	77.3–78.2	—	—	2/3
13	Whisper Local Large	76.5	76.1–76.9	—	—	2/3
14	Azure STT Standard	70.3	70.1–70.6	$0.0170	16.3s	2/3
15	Nemo Parakeet Tdt 0.6b V3	69.7	69.2–70.2	—	—	2/3
16	Speechmatics Standard	69.2	68.7–69.6	$0.0040	127.7s	2/3
17	Azure Whisper	68.3	67.8–68.8	$0.0060	214.3s	2/3
18	Assemblyai Best	68.2	67.7–68.8	$0.0045	22.7s	2/3
19	Assemblyai Nano	67.7	67.1–68.2	$0.0045	15.2s	2/3
20	Openai Gpt4o Transcribe	66.5	66.3–66.8	$0.0060	11.7s	2/3
21	Nemo Canary 1b V2	63.9	63.3–64.4	—	—	2/3
22	Whisper Local Large-V2	63.3	62.9–63.8	—	—	2/3
23	Speechmatics Enhanced	57.2	56.8–57.7	$0.0040	120.0s	2/3
24	Distil Whisper Distil-Large-V3-Pl	51.6	51.1–52.2	—	—	2/3
25	Whisper Local Turbo	50.3	49.8–50.7	—	—	3/3
26	Omnilingual ASR Llm-7b	49.1	48.7–49.4	—	—	2/3
27	Azure Gpt 4o Transcribe Diarize	48.6	47.9–49.4	$0.0060	20.6s	2/3
28	Omnilingual ASR Llm-1b	43.7	43.4–44.0	—	—	2/3
29	Omnilingual ASR Llm-3b	43.4	43.1–43.8	—	—	2/3
30	Whisper Local Medium	40.6	40.1–41.0	—	—	3/3
31	Omnilingual ASR Llm-300m	34.7	34.3–35.0	—	—	2/3
32	Mms 1b-All	32.1	31.6–32.7	—	—	2/3
33	Whisper Finetuned Bardsai-Large-V2-Pl-V2	30.4	17.1–43.7	—	—	2/3
34	Mms 1b-L1107	30.4	29.8–30.9	—	—	2/3
35	Omnilingual ASR Ctc-7b	29.0	28.7–29.3	—	—	2/3
36	Whisper Local Small	28.6	28.1–29.0	—	—	3/3
37	Nemo STT Pl Fastconformer Hybrid Large Pc	27.1	26.5–27.7	—	—	2/3
38	Deepgram Nova-2	25.3	24.7–25.9	$0.0043	4.4s	2/3
39	Whisper Finetuned Thestageai-Large-V3-Turbo	25.3	15.2–35.4	—	—	2/3
40	Omnilingual ASR Ctc-3b	25.0	24.7–25.3	—	—	2/3
41	Omnilingual ASR Ctc-1b	23.9	23.6–24.3	—	—	2/3
42	Deepgram Nova-3	22.7	22.1–23.3	$0.0043	5.3s	2/3
43	Nemo STT Multilingual Fastconformer Hybrid Large Pc	19.5	18.9–20.1	—	—	2/3
44	Google STT V1 Command And Search	18.3	17.8–18.8	$0.0240	11.7s	2/3
45	Google STT V1	17.9	17.4–18.4	$0.0240	12.6s	2/3
46	Whisper Local Base	16.8	16.1–17.5	—	—	3/3
47	Nemo STT Pl Quartznet15x5	15.4	14.8–15.9	—	—	2/3
48	Mms 1b-Fl102	14.2	13.5–14.8	—	—	2/3
49	Omnilingual ASR Ctc-300m	13.3	13.0–13.6	—	—	2/3
50	Whisper Local Tiny	10.6	9.9–11.3	—	—	3/3
51	Nemo STT En Fastconformer Hybrid Large Pc	3.9	3.6–4.2	—	—	2/3
52	Distil Whisper Distil-Large-V3	0.0	0.0–0.9	—	—	2/3

* cost GPU-imputed for local providers (no public per-minute price).

Quality × Cost

Upper-left is the sweet spot: high quality, low cost per minute.

Quality × Speed

Upper-right is the sweet spot: high quality, fast throughput.

Hardware caveat

Per-task efficiency metrics (ADR-061). cost_per_minute_usd may be imputed for local providers (see has_imputed_cost). time_per_minute_sec = 60 / RTFx is HARDWARE-DEPENDENT and indicative only — not cross-comparable as a latency SLA.

RTFx measured on Darwin-x86_64. Time per minute is hardware-dependent and not a cross-comparable latency SLA.