# Global Language Understanding for AIs

![](https://3205006002-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FleYcqBx5FRZcVr3wI4f4%2Fuploads%2F31JNK140styRUKjd83ZJ%2F0.png?alt=media)

### The Leaderboard <a href="#ww0mw4h50ppi" id="ww0mw4h50ppi"></a>

For these languages and particular audio samples, here’s our recommended speech recognition + machine translation model:

<table><thead><tr><th width="117">Language</th><th width="122">Partner Data Set Link</th><th>Top Performing Model</th><th width="110">Runner Up</th><th>Link to Evaluation</th></tr></thead><tbody><tr><td>Hindi</td><td><a href="https://docs.google.com/spreadsheets/d/13f-K31MWsZh2NI9M6tQsmx2CvQEz4GxfJLXVNAhO-tI/edit?usp=sharing">ARTPARK (IISc)</a></td><td><a href="https://gooey.ai/speech/gpt-4o-hindi-5aoagu18mz47/">GPT-4o</a></td><td>ElevenLabs Scribe v1</td><td><a href="https://gooey.ai/bulk/compare-hindi-speech-recognition-e04tnvdscvzo/">Hindi Bulk Run</a></td></tr><tr><td>Swahili</td><td><a href="https://docs.google.com/spreadsheets/d/1mfMLRKgpNoAJdjpOt9lDwVM72zKbTXkfUXdaCzPn5m4/edit?gid=1943030623#gid=1943030623">Gates Foundation</a> </td><td><a href="https://gooey.ai/copilot/swahili-jacaranda-gpt-5-a2t-pkj3bnha20zu/">Jacaranda + GPT5 </a></td><td><a href="https://gooey.ai/copilot/swahili-jacaranda-gpt-5-google-mt-a2t-8ps1p54ep74x/">Jacaranda + GPT5 + Google MT</a></td><td><a href="https://gooey.ai/bulk/top4-swahili-audio2text-comparison-7qs-4qk762cbmepp/">Swahili Bulk Run</a></td></tr><tr><td>Kikuyu</td><td><a href="https://docs.google.com/spreadsheets/d/1WPwqoAlDwGS5mX9_G0Lqv8Agb9kd-q-nl5FEyDCWruQ/edit?gid=0#gid=0">Gates Foundation</a></td><td><a href="https://gooey.ai/copilot/kikuyu-akera-gpt-5-a2t-dyfyu53k3viv/">Akera+GPT5</a> &#x26; <a href="https://gooey.ai/copilot/kikuyu-akeramtgemini25pro-a2t-e9ciqiark5sf/">Akera+Gemini2.5pro+GoogleMT</a></td><td><a href="https://gooey.ai/copilot/kikuyu-akera-gemini25pro-a2t-fi682gqx6z0o/">Akera+Gemini2.5pro</a></td><td><a href="https://gooey.ai/bulk/top5-kikuyu-audio2text-comparison-3qs-t2bvvh7zgsp2/">Kikuyu Bulk Run</a></td></tr><tr><td>Chichewa</td><td><a href="https://docs.google.com/spreadsheets/d/1_-ZhbOys9UY6gARwSyjRxtn4wTRzN9aIyuIn9qGqYdQ/edit?usp=drive_link">Opportunity</a></td><td><a href="https://gooey.ai/speech/chichewa-asr-via-mms-large-google-translate-afsj26nrak0f/">Seamless M4T v1 + Google Translate</a></td><td>MMS</td><td><a href="https://gooey.ai/bulk/compare-chichewa-speech-recognition-45j0h174/">Chichewa Bulk Run</a></td></tr><tr><td>Kinyarwanda (*new)</td><td><a href="https://docs.google.com/spreadsheets/d/1G8qEIcS9NWQtRtKa5Nkj_3E4X0hc1xGk3uHeSfVWL80/edit?gid=308318854#gid=308318854">Gates Foundation</a></td><td><a href="https://gooey.ai/copilot/kinyarwanda-mbaza-gpt-5-a2t-s6bqnt86h8h3/">Kinyarwanda (Mbaza+GPT5)</a> + <a href="https://gooey.ai/copilot/kinyarwanda-mbaza-gemini25pro-a2t-mr3yy6ovs3of/">Kinyarwanda (Mbaza+Gemini2.5pro)</a> + <a href="https://gooey.ai/copilot/kinyarwanda-mbazagpt5g-mt-a2t-d3mesu7yhrtr/">Kinyarwanda (Mbaza+GPT5+GoogleMT)</a></td><td><a href="https://gooey.ai/copilot/kinyarwanda-sunbird-gpt-5-a2t-ga7uck3gko9o/">Sunbird+GPT5</a></td><td><a href="https://gooey.ai/bulk/top5-kinyarwanda-audio2text-compare-30qs-n9n2xl4ttbxo/">Kinyarwanda Bulk Run</a></td></tr><tr><td>Magahi</td><td>Looking for Collaborator/Partner</td><td></td><td></td><td></td></tr><tr><td>Luo/Dhuluo</td><td>Looking for Collaborator/Partner</td><td></td><td></td><td></td></tr><tr><td>Maithili</td><td>Looking for Collaborator/Partner</td><td></td><td></td><td></td></tr></tbody></table>

### Summary <a href="#yrx2t8oj7q7o" id="yrx2t8oj7q7o"></a>

Any organization working with low-resource users and AI must first tackle a fundamental question - *can the AI actually understand the text and audio clips from my particular set of users*? Too often, the answer is “not very well” meaning the incredible knowledge reasoning capabilities of AI are unavailable to these populations.

Fortunately, the field is moving incredibly fast and better models are being released every day. This effort attempts to make it easy for any organization to provide their own audio & text samples, their “golden” expert-created transcriptions and translations and then to evaluate the best AI models available to determine which actually understands their users best.

This effort is in collaboration with the [Glocal Eval of Models](https://peopleplus.ai/leaderboard) Initiative by PeoplePlus.AI and could not succeed without the collaborative support of the EkStep Foundation, ARTPARK, Opportunity.org, DigitalGreen, AI4Bharat, Karya.in, Microsoft Research India, GIZ and the Gates Foundation.

### Why do we need such a system? <a href="#b245vcbcy67v" id="b245vcbcy67v"></a>

Low-resource languages are not uniform. Local dialects abound. Unfortunately, capturing this diversity of language is hard for tech companies attempting to make speech recognition and translation models. This paucity of high-quality diverse training sets then leads to poor performance of AI models. This poor performance then implies that incredible tools like GPT4 and Gemini - which are primarily trained on English - don’t work particularly well for speakers of low-resource languages.

Partners like ARTPARK have existing efforts to collect data sets but this initiative focuses on a different part of the problem - namely enabling organizations to provide their own collections of audio samples to discover which combination of state of the art speech recognition and translation AI models actually understand their users’ speech best. As private, public and open source technology makers publish ever-improving AI models each month, we wish to enable organizations to quickly benchmark new models with their own test data so they can make appropriate price, speed and performance decisions as well.

As a bonus, given that these models run “hot” on Gooey.AI and are available via APIs and our high-level workflows such as <https://gooey.ai/CoPilot>, organizations can then immediately deploy their chosen model in AI agents like [Farmer.CHAT](https://help.gooey.ai/farmerchat).

Through this and [Glocal Eval Model](https://peopleplus.ai/leaderboard) effort, we hope to also prod private, government and public technology makers to create ever-better AI models for low-resource language understanding via the proven power of open, transparent competition. We take inspiration from other AI leaderboards like huggingface’s [Open LLM ranking board](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

### Goals: <a href="#jvhaa3pvf2jo" id="jvhaa3pvf2jo"></a>

1. Provide a place where organizations can determine which AI Speech recognition and translation models work best for their particular use case, especially with low-resource languages.
2. Catalyze the industry to create better low-resource language AI models by creating a popular, highly referenced destination for researchers (and the press) to compare models.
3. Build an open-source dataset of audio files and golden human transcriptions and translations from scores of organizations that is representative of the likely phrases involved in aiding low-resource users. E.g. We’ll create a dataset of rural Tamilian female users asking on Android phones' WhatsApp how to know if their crops are rotting vs transcriptions of popular Tamilian songs.

### Where are we right now: <a href="#id-90uzynodtfbm" id="id-90uzynodtfbm"></a>

#### Gooey.AI currently supports 13 ASR Models. <a href="#xy5rv74hjecn" id="xy5rv74hjecn"></a>

Gooey supports the top public/open models. Here is the list of all the core models available via Gooey.AI :

Recently added (Q2 and Q3 2025)

* [ElevenLabs Scribe v1](https://gooey.ai/speech/11labs-hindi-speech-recognition-c0olu3ozkrjj/)
* [Vulavula AI](https://gooey.ai/speech/bambara-speech-recognition-and-translation-vulavula-ovqohrnj79i5/)
* [Jacaranda](https://gooey.ai/copilot/swahili-jacaranda-gpt-5-a2t-pkj3bnha20zu/)
* [GPT-4o](https://gooey.ai/speech/gpt-4o-hindi-5aoagu18mz47/)
* [Akera](https://gooey.ai/speech/kikuyu-asr-via-akerawhisper-kik-full_v2-fine-tuned-us5dwt521r2l/)
* [Mbaza](https://gooey.ai/speech/mbaza-asr-google-translate-swahili-en-x06smbljck5e/)
* [Sunbird](https://gooey.ai/speech/sunbird-asr-google-translate-swahili-en-zgvp8byabt2m/)

Added (2024)

* [Whisper Large v2 & v3 (OpenAI)](https://gooey.ai/speech/whisper-large-v3-kannada-kgutjq2sux61/) - [model link](https://huggingface.co/openai/whisper-large-v3)
* [Whisper Hindi Large v2 (Bhashini)](https://gooey.ai/speech/whisper-hindi-large-v2-bhashini-bmo38059wc7t/) - [model link](https://huggingface.co/vasista22/whisper-hindi-large-v2)
* [Whisper Telugu Large v2 (Bhashini)](https://gooey.ai/speech/whisper-telugu-large-v2-bhashini-y6f5gq0t4ksl/) - [model link](https://huggingface.co/vasista22/whisper-telugu-large-v2)
* [Conformer English (ai4bharat.org)](https://gooey.ai/speech/conformer-english-ai4bharatorg-24r8h5dcay8m/) - [model link](https://github.com/Open-Speech-EkStep/vakyansh-models)
* [Conformer Hindi (ai4bharat.org)](https://gooey.ai/speech/conformer-hindi-ai4bharatorg-2w2zh91rcqd4/) - [model link](https://github.com/Open-Speech-EkStep/vakyansh-models)
* [Vakyansh Bhojpuri (Open-Speech-EkStep)](https://gooey.ai/speech/bhojpuri-speech-recognition-using-gatesekstep-y1w6l21s/) - [model link](https://github.com/Open-Speech-EkStep/vakyansh-models)
* [Google Cloud v1](https://gooey.ai/speech/google-cloud-v1-swahili-en-yq6uv8rapzpt/)
* [USM/Chirp](https://gooey.ai/speech/chirpusm-google-ilczaj48wxgn/) (Google)
* [Deepgram](https://gooey.ai/speech/deepgram-english-dxtueibfeug2/)
* [Azure Speech Recognition](https://gooey.ai/speech/azure-asr-swahili-ecslgjq79rvz/)
* [Seamless M4T (Meta Research)](https://gooey.ai/speech/seamless-m4t-kannada-en-o3ec9xbu5l73/) - [model link](https://github.com/facebookresearch/seamless_communication)
* [Massively Multilingual Speech (Meta Research)](https://gooey.ai/speech/conformer-english-ai4bharatorg-24r8h5dcay8m/) - [model link](https://github.com/facebookresearch/fairseq/tree/main/examples/mms)

#### Current support for Machine Translation <a href="#u8jafvhqgrsd" id="u8jafvhqgrsd"></a>

* Google Translate
* [GhanaNLP](https://ghananlp.org/)
* Coming soon
  * Azure
  * Seamless MT v2
  * Translation via LLM models (Claude3, Mixtral, GPT4 + Gemini 1.5 Pro) ([compare translations here](https://gooey.ai/compare-large-language-models/compare-translations-from-claude3-gpt4-mixtral-vs-gemini-15-f4w9msgw/))

### How to use the models <a href="#frmybzg2yudi" id="frmybzg2yudi"></a>

All of these can be used via our standalone [speech workflow](https://gooey.ai/speech) or API or inside our [copilot](https://gooey.ai/copilot) recipe (for use in WhatsApp, Slack, as a web-widget or inside an app of your choice).

### How to determine which AI model is best for your data <a href="#t59mz4knfpzh" id="t59mz4knfpzh"></a>

#### Gooey.AI Bulk Workflow <a href="#dfry1nbhu6if" id="dfry1nbhu6if"></a>

With our [Bulk and Evaluation Workflow](https://gooey.ai/bulk), you provide a CSV or google sheet with all your audio samples (and their transcription and translation) and you can run a workflow that looks like this:

**Compare Chichewa Speech Recognition -** [**https://gooey.ai/bulk/?example\_id=45j0h174**](https://gooey.ai/bulk/?example_id=45j0h174)

In this bulk run example, we compare 4 different Gooey.AI speech recognition + translations workflows (<https://gooey.ai/speech>), each of which uses a different AI speech model:

#### Gooey.AI Eval Workflow <a href="#hshnace6bqun" id="hshnace6bqun"></a>

Gooey published an early version that evaluates Hindi, Kannada and Telugu on 3-6 different engines here <https://gooey.ai/eval/examples>. Eval represents the second part of the evaluation, taking as input an excel sheet with the transcription and translations from competing models, and then running an LLM script on each row to create scores for each translation vs the golden human provided answer. Once each row is scored, the Eval workflow then averages the scores and graphs them.

**Example output for Telugu Eval (**[**demo**](https://gooey.ai/eval/?example_id=lc1f4ka1)**):**

![](https://3205006002-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FleYcqBx5FRZcVr3wI4f4%2Fuploads%2FonfiaZl2kOFFdn5hqlIo%2F1.png?alt=media)

<table data-view="cards"><thead><tr><th></th><th data-hidden data-card-cover data-type="files"></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><h3>📖 GUIDE</h3><p>How to create language evaluation for ASR?</p></td><td><a href="https://3205006002-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FleYcqBx5FRZcVr3wI4f4%2Fuploads%2FxPVXrb6e4tfU3iDsEW6R%2Fgooey.ai%20-%20cute%20vintage%20poster%20style%20illustration%20of%20a%20young%20man%20reading%20the%20user%20manual%20of%20a%20robot.png?alt=media&#x26;token=9f7eda3d-c51e-40fb-81e7-975c82a99bb3">gooey.ai - cute vintage poster style illustration of a young man reading the user manual of a robot.png</a></td><td><a href="https://app.gitbook.com/s/5BFP5RUm6rTLXk8wUSTf/speech-and-language/how-to-use-asr/how-to-create-language-evaluation-for-asr">Create language evaluation for Speech Recognition</a></td></tr><tr><td><h3>🗣️ Hindi ASR Evaluation</h3><p>Get started with a pre-filled example</p></td><td><a href="https://3205006002-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FleYcqBx5FRZcVr3wI4f4%2Fuploads%2FLEH2cnz0KEWRpwXGtbVm%2Fgooey.ai%20-%20cute%20vintage%20poster%20style%20illustration%20of%20a%20small%20group%20of%20indian%20kids%20learning%20hindi%20(1).png?alt=media&#x26;token=a1bb839d-c75f-40e1-be70-4e44bf7fbed5">gooey.ai - cute vintage poster style illustration of a small group of indian kids learning hindi (1).png</a></td><td><a href="https://gooey.ai/bulk/compare-hindi-speech-recognition-hkgs8120p11t/">https://gooey.ai/bulk/compare-hindi-speech-recognition-hkgs8120p11t/</a></td></tr></tbody></table>

### Partners: <a href="#doidwxqoktyj" id="doidwxqoktyj"></a>

Gooey.AI - [Sean Blagsvedt](mailto:sean@blagsvedt.com) & Dev Aggrawal

Opportunity.org - an NGO building AI agents for Malawi farmers. [Chichewa Evaluation is here](https://gooey.ai/bulk/?example_id=45j0h174) (Seamless and Google USM appear to beat Azure and Whisper 3)

PeoplePlus.AI

Digital Green collaborators with Farmer.CHAT

ARTPARK.in [Raghu Dharmaraju](mailto:raghu@artpark.in)

People +AI [Tanuj Bhojwani](mailto:tanuj@peopleplus.ai) [Harsha G](mailto:harsha@peopleplus.ai)

AI4Bharat Prof. [Mitesh Khapra](mailto:miteshk@cse.iitm.ac.in)

Gates Foundation AIEP Cohort

### Personas <a href="#g3l0zkxtquci" id="g3l0zkxtquci"></a>

1. Producers. Interested in making better models. Researchers primarily.
   1. Tech NGOs like ARTPARK, Wadhwani, etc.
2. Consumers. Interested in using better models.

* Grassroots NGOs like Avanti, DigitalGreen, Pratham (education), etc.
* Private Orgs like PayTM, Setu, etc.
* Indic language content producers - InShorts, DailyHunt, KukuFM, etc

### Milestones <a href="#li0ojz1irqbz" id="li0ojz1irqbz"></a>

1. Release the first set of evaluations - DONE
2. Provide a system and documentation for organizations to run their own evaluations. - DONE
3. Aid consumer organization who wish to use the speech and translation APIs in their apps, websites, WhatsApp AI agents, etc on Gooey.AI. - Email us at <support@gooey.ai>

**Coming soon**

1. Enable producer organizations to host and compare their models on Gooey.AI
2. Enable producer orgs to host the Gooey.AI runtime locally in their own GPU farms.

### References and Links <a href="#id-4cregjdxa4xx" id="id-4cregjdxa4xx"></a>

1. [\[2303.12528\] MEGA: Multilingual Evaluation of Generative AI](https://arxiv.org/abs/2303.12528)
2. [We're excited to announce the release of multiple self-supervised learning (SSL) models and fine-tuned ASR models for Indian languages, which currently rank as the top-performing models in the public domain.](https://asr.iitm.ac.in/models/)
