🏎️Global Language Understanding for AIs
Use-case specific leaderboards for Low-Resource Language Speech Reco & Translations Models
The Leaderboard
For these languages and particular audio samples, here’s our recommended speech recognition + machine translation model:
Language | Partner Data Set Link | Top Performing Model | Runner Up | Link to Evaluation |
---|---|---|---|---|
Hindi | Azure | |||
Bhojpuri | MMS | |||
Swahili | Seamless M4T | |||
Kikuyu | Chirp/USM | |||
Chichewa | MMS | |||
Magahi | Looking for Collaborator/Partner | |||
Luo/Dhuluo | Looking for Collaborator/Partner | |||
Maithili | Looking for Collaborator/Partner |
Summary
Any organization working with low-resource users and AI must first tackle a fundamental question - can the AI actually understand the text and audio clips from my particular set of users? Too often, the answer is “not very well” meaning the incredible knowledge reasoning capabilities of AI are unavailable to these populations.
Fortunately, the field is moving incredibly fast and better models are being released every day. This effort attempts to make it easy for any organization to provide their own audio & text samples, their “golden” expert-created transcriptions and translations and then to evaluate the best AI models available to determine which actually understands their users best.
This effort is in collaboration with the Glocal Eval of Models Initiative by PeoplePlus.AI and could not succeed without the collaborative support of the EkStep Foundation, ARTPARK, Opportunity.org, DigitalGreen, AI4Bharat, Karya.in, Microsoft Research India, GIZ and the Gates Foundation.
Why do we need such a system?
Low-resource languages are not uniform. Local dialects abound. Unfortunately, capturing this diversity of language is hard for tech companies attempting to make speech recognition and translation models. This paucity of high-quality diverse training sets then leads to poor performance of AI models. This poor performance then implies that incredible tools like GPT4 and Gemini - which are primarily trained on English - don’t work particularly well for speakers of low-resource languages.
Partners like ARTPARK have existing efforts to collect data sets but this initiative focuses on a different part of the problem - namely enabling organizations to provide their own collections of audio samples to discover which combination of state of the art speech recognition and translation AI models actually understand their users’ speech best. As private, public and open source technology makers publish ever-improving AI models each month, we wish to enable organizations to quickly benchmark new models with their own test data so they can make appropriate price, speed and performance decisions as well.
As a bonus, given that these models run “hot” on Gooey.AI and are available via APIs and our high-level workflows such as https://gooey.ai/CoPilot, organizations can then immediately deploy their chosen model in AI bots like Farmer.CHAT.
Through this and Glocal Eval Model effort, we hope to also prod private, government and public technology makers to create ever-better AI models for low-resource language understanding via the proven power of open, transparent competition. We take inspiration from other AI leaderboards like huggingface’s Open LLM ranking board.
Goals:
Provide a place where organizations can determine which AI Speech recognition and translation models work best for their particular use case, especially with low-resource languages.
Catalyze the industry to create better low-resource language AI models by creating a popular, highly referenced destination for researchers (and the press) to compare models.
Build an open-source dataset of audio files and golden human transcriptions and translations from scores of organizations that is representative of the likely phrases involved in aiding low-resource users. E.g. We’ll create a dataset of rural Tamilian female users asking on Android phones WhatsApp how to know if their crops are rotting vs transcriptions of popular Tamilian songs.
Where are we right now:
Gooey.AI currently supports 13 ASR Models.
Gooey supports the top public/open models. Here is the list of all the core models available via Gooey.AI :
USM/Chirp (Google)
Current support for Machine Translation
Google Translate
Coming soon
Azure
Seamless MT v2
Translation via LLM models (Claude3, Mixtral, GPT4 + Gemini 1.5 Pro) (compare translations here)
How to use the models
All of these can be used via our standalone speech workflow or API or inside our copilot recipe (for use in WhatsApp, Slack, as a web-widget or inside an app of your choice).
How to determine which AI model is best for your data
Gooey.AI Bulk Workflow
With our Bulk and Evaluation Workflow, you provide a CSV or google sheet with all your audio samples (and their transcription and translation) and you can run a workflow that looks like this:
Compare Chichewa Speech Recognition - https://gooey.ai/bulk/?example_id=45j0h174
In this bulk run example, we compare 4 different Gooey.AI speech recognition + translations workflows (https://gooey.ai/speech), each of which uses a different AI speech model:
Gooey.AI Eval Workflow
Gooey published an early version that evaluates Hindi, Kannada and Telugu on 3-6 different engines here https://gooey.ai/eval/examples. Eval represents the second part of the evaluation, taking as input an excel sheet with the transcription and translations from competing models, and then running an LLM script on each row to create scores for each translation vs the golden human provided answer. Once each row is scored, the Eval workflow then averages the scores and graphs them.
Example output for Telugu Eval (demo):
Partners:
Gooey.AI - Sean Blagsvedt & Dev Aggrawal
Opportunity.org - an NGO building AI bots for Malawi farmers. Chichewa Evaluation is here (Seamless and Google USM appear to beat Azure and Whisper 3)
PeoplePlus.AI
Digital Green collaborators with Farmer.CHAT
ARTPARK.in Raghu Dharmaraju
People +AI Tanuj Bhojwani Harsha G
AI4Bharat Prof. Mitesh Khapra
Gates Foundation AIEP Cohort
Personas
Producers. Interested in making better models. Researchers primarily.
Tech NGOs like ARTPARK, Wadhwani, etc.
Consumers. Interested in using better models.
Grassroots NGOs like Avanti, DigitalGreen, Pratham (education), etc.
Private Orgs like PayTM, Setu, etc.
Indic language content producers - InShorts, DailyHunt, KukuFM, etc
Milestones
Release the first set of evaluations - DONE
Provide a system and documentation for organizations to run their own evaluations. - DONE
Aid consumer organization who wish to use the speech and translation APIs in their apps, websites, WhatsApp bots, etc on Gooey.AI. - Email us at support@gooey.ai
Coming soon
Enable producer organizations to host and compare their models on Gooey.AI
Enable producer orgs to host the Gooey.AI runtime locally in their own GPU farms.
References and Links
Last updated