LogoLogo
HomeExploreDocsAPIBlogContact
  • Gooey.AI Updates & Blog
  • Gooey.AI and Wellcome Trust are partnering with MEXA Gen AI Mental Health Research Accelerator
  • 🌱Gooey.AI Workflow Accelerator Supported by The Rockefeller Foundation
  • 🎉2025 Gooey.AI Copilot Update
  • Embeddable Web Widget Made With React
  • 🏃‍♀️Handling schema migrations on a live database, at scale
  • AI Workflow Standards
  • 🧩Fun fun functions!
  • 🌼Spring Into Summer With Copilot
  • 🏎️Global Language Understanding for AIs
  • 🍜From Bland to Brilliant ChatBots: New Copilot Features for April 2024!
  • 🎉The 2023 Gooey.AI Recap
  • 🤯The GenAI Marketing Disruption & How GooeyAI Can Help
  • 🍺Heineken / Tiger QR Code Case Study
  • 🙌Gooey.AI's Open Source Vision
  • 🤝How to Use Gooey.AI with Google Colab
  • 🌳Climate-smart practices become more accessible for farmers through Farmer.CHAT
Powered by GitBook
LogoLogo

Home

  • Gooey.AI
  • Explore Workflows
  • Sign In
  • Pricing

Learn

  • Docs
  • Blog
  • FAQs
  • Videos

Developers

  • How-to Guides
  • Get your Gooey.AI Key
  • Github
  • API Endpoints

Connect

  • Book a Demo
  • Discord
  • Team
  • Jobs

©2024 by Gooey.AI / Dara.network Inc / support@gooey.ai

On this page
  • The Leaderboard
  • Summary
  • Why do we need such a system?
  • Goals:
  • Where are we right now:
  • How to use the models
  • How to determine which AI model is best for your data
  • Partners:
  • Personas
  • Milestones
  • References and Links

Was this helpful?

Global Language Understanding for AIs

Use-case specific leaderboards for Low-Resource Language Speech Reco & Translations Models

Last updated 2 months ago

Was this helpful?

The Leaderboard

For these languages and particular audio samples, here’s our recommended speech recognition + machine translation model:

Language
Partner Data Set Link
Top Performing Model
Runner Up
Link to Evaluation

Hindi

Azure

Bhojpuri

MMS

Swahili

Google Chirp/USM v2 + Google Translate & GPT-4o ASR (auto detect) + Google Trans

Kikuyu

Chirp/USM

Chichewa

MMS

Magahi

Looking for Collaborator/Partner

Luo/Dhuluo

Looking for Collaborator/Partner

Maithili

Looking for Collaborator/Partner

Summary

Any organization working with low-resource users and AI must first tackle a fundamental question - can the AI actually understand the text and audio clips from my particular set of users? Too often, the answer is “not very well” meaning the incredible knowledge reasoning capabilities of AI are unavailable to these populations.

Fortunately, the field is moving incredibly fast and better models are being released every day. This effort attempts to make it easy for any organization to provide their own audio & text samples, their “golden” expert-created transcriptions and translations and then to evaluate the best AI models available to determine which actually understands their users best.

Why do we need such a system?

Low-resource languages are not uniform. Local dialects abound. Unfortunately, capturing this diversity of language is hard for tech companies attempting to make speech recognition and translation models. This paucity of high-quality diverse training sets then leads to poor performance of AI models. This poor performance then implies that incredible tools like GPT4 and Gemini - which are primarily trained on English - don’t work particularly well for speakers of low-resource languages.

Partners like ARTPARK have existing efforts to collect data sets but this initiative focuses on a different part of the problem - namely enabling organizations to provide their own collections of audio samples to discover which combination of state of the art speech recognition and translation AI models actually understand their users’ speech best. As private, public and open source technology makers publish ever-improving AI models each month, we wish to enable organizations to quickly benchmark new models with their own test data so they can make appropriate price, speed and performance decisions as well.

Goals:

  1. Provide a place where organizations can determine which AI Speech recognition and translation models work best for their particular use case, especially with low-resource languages.

  2. Catalyze the industry to create better low-resource language AI models by creating a popular, highly referenced destination for researchers (and the press) to compare models.

  3. Build an open-source dataset of audio files and golden human transcriptions and translations from scores of organizations that is representative of the likely phrases involved in aiding low-resource users. E.g. We’ll create a dataset of rural Tamilian female users asking on Android phones WhatsApp how to know if their crops are rotting vs transcriptions of popular Tamilian songs.

Where are we right now:

Gooey.AI currently supports 13 ASR Models.

Gooey supports the top public/open models. Here is the list of all the core models available via Gooey.AI :

Current support for Machine Translation

  • Google Translate

  • Coming soon

    • Azure

    • Seamless MT v2

How to use the models

How to determine which AI model is best for your data

Gooey.AI Bulk Workflow

Gooey.AI Eval Workflow

Partners:

PeoplePlus.AI

Digital Green collaborators with Farmer.CHAT

Gates Foundation AIEP Cohort

Personas

  1. Producers. Interested in making better models. Researchers primarily.

    1. Tech NGOs like ARTPARK, Wadhwani, etc.

  2. Consumers. Interested in using better models.

  • Grassroots NGOs like Avanti, DigitalGreen, Pratham (education), etc.

  • Private Orgs like PayTM, Setu, etc.

  • Indic language content producers - InShorts, DailyHunt, KukuFM, etc

Milestones

  1. Release the first set of evaluations - DONE

  2. Provide a system and documentation for organizations to run their own evaluations. - DONE

Coming soon

  1. Enable producer organizations to host and compare their models on Gooey.AI

  2. Enable producer orgs to host the Gooey.AI runtime locally in their own GPU farms.

References and Links

&

This effort is in collaboration with the Initiative by PeoplePlus.AI and could not succeed without the collaborative support of the EkStep Foundation, ARTPARK, Opportunity.org, DigitalGreen, AI4Bharat, Karya.in, Microsoft Research India, GIZ and the Gates Foundation.

As a bonus, given that these models run “hot” on Gooey.AI and are available via APIs and our high-level workflows such as , organizations can then immediately deploy their chosen model in AI bots like .

Through this and effort, we hope to also prod private, government and public technology makers to create ever-better AI models for low-resource language understanding via the proven power of open, transparent competition. We take inspiration from other AI leaderboards like huggingface’s .

-

-

-

-

-

-

(Google)

-

-

Translation via LLM models (Claude3, Mixtral, GPT4 + Gemini 1.5 Pro) ()

All of these can be used via our standalone or API or inside our recipe (for use in WhatsApp, Slack, as a web-widget or inside an app of your choice).

With our , you provide a CSV or google sheet with all your audio samples (and their transcription and translation) and you can run a workflow that looks like this:

Compare Chichewa Speech Recognition -

In this bulk run example, we compare 4 different Gooey.AI speech recognition + translations workflows (), each of which uses a different AI speech model:

Gooey published an early version that evaluates Hindi, Kannada and Telugu on 3-6 different engines here . Eval represents the second part of the evaluation, taking as input an excel sheet with the transcription and translations from competing models, and then running an LLM script on each row to create scores for each translation vs the golden human provided answer. Once each row is scored, the Eval workflow then averages the scores and graphs them.

Example output for Telugu Eval ():

Gooey.AI - & Dev Aggrawal

Opportunity.org - an NGO building AI bots for Malawi farmers. (Seamless and Google USM appear to beat Azure and Whisper 3)

ARTPARK.in

People +AI

AI4Bharat Prof.

Aid consumer organization who wish to use the speech and translation APIs in their apps, websites, WhatsApp bots, etc on Gooey.AI. - Email us at

🏎️
Glocal Eval of Models
https://gooey.ai/CoPilot
Farmer.CHAT
Glocal Eval Model
Open LLM ranking board
Whisper Large v2 & v3 (OpenAI)
model link
Whisper Hindi Large v2 (Bhashini)
model link
Whisper Telugu Large v2 (Bhashini)
model link
Conformer English (ai4bharat.org)
model link
Conformer Hindi (ai4bharat.org)
model link
Vakyansh Bhojpuri (Open-Speech-EkStep)
model link
Google Cloud v1
USM/Chirp
Deepgram
Azure Speech Recognition
Seamless M4T (Meta Research)
model link
Massively Multilingual Speech (Meta Research)
model link
GhanaNLP
compare translations here
speech workflow
copilot
Bulk and Evaluation Workflow
https://gooey.ai/bulk/?example_id=45j0h174
https://gooey.ai/speech
https://gooey.ai/eval/examples
demo
Sean Blagsvedt
Chichewa Evaluation is here
Raghu Dharmaraju
Tanuj Bhojwani
Harsha G
Mitesh Khapra
support@gooey.ai
[2303.12528] MEGA: Multilingual Evaluation of Generative AI
We're excited to announce the release of multiple self-supervised learning (SSL) models and fine-tuned ASR models for Indian languages, which currently rank as the top-performing models in the public domain.
ARTPARK (IISc)
Conformer Hindi (ai4bharat.org)
Hindi Bulk Run
ARTPARK (IISc)
Conformer Hindi (ai4bharat.org)
Bhojpuri Bulk Run
Opportunity International
M4T + Google Translate
GPT4o + Google Translate
Swahili Bulk Run
Digital Green
MMS-Large + GhanaNLP
Kikuyu Bulk Run
Opportunity
Seamless M4T v1 + Google Translate
Chichewa Bulk Run
Cover

🗣️ Hindi ASR Evaluation

Get started with a pre-filled example

Cover

📖 GUIDE

How to create language evaluation for ASR?