Meta’s Omnilingual ASR Transcribes Speech in 1,600+ Languages

Meta’s FAIR team released Omnilingual ASR on November 10, 2025, a set of open-weight models that handle automatic speech recognition for more than 1,600 languages. This includes 500 low-resource ones never transcribed by AI before, like Hwana in Nigeria, Rotokas in Papua New Guinea, and Güilá Zapotec in Mexico, as PCMag reports.
How It Works
The models come in sizes from 300 million to 7 billion parameters, MarkTechPost reports, so they run on phones or beefy servers. The base is Omnilingual wav2vec 2.0, trained on raw speech without needing tons of labeled data per language. They pair it with decoders—one using CTC, the other a transformer style from LLMs—to spit out text from audio.
Key trick: add a new language with just a few audio clips and matching text. No big datasets or expert tuning required. Meta says this gets you decent transcription fast. Performance hits a character error rate under 10% for 78% of the 1,600+ languages, beating OpenAI’s Whisper on low-resource ones, Slator explains. It’s experimental, though—always check outputs.
Indian languages covered include Hindi, Marathi, Malayalam, Tulu, Telugu, Odia, Punjabi, Marwari, Urdu, and rarer ones like Kui, Chattisgarhi, Maithili, Bagheli, Mahasu Pahari, Awadhi, and Rajbanshi, per Indian Express and Business Standard.
Meta also open-sourced the Omnilingual ASR Corpus: transcribed speech from 350 underserved languages, collected with local groups and native speakers under CC-BY. All models are Apache 2.0 on Hugging Face, VentureBeat notes.
Real-World Uses
In Nigeria, health workers transcribe Hausa in clinics for better records and care. It helps preserve endangered languages by making audio archives searchable. Other spots: education tools, dubbing media into local tongues, global calls, and content creation where speakers lack options.
Meta built this with partners like Mozilla’s Common Voice, paying speakers in remote areas. It fights the gap where low-resource languages get ignored online.