Building AI for Underrepresented Languages and Creators

African Language Evals, updated models, and new video workflows!

Language Evals

In our mission to catalyze AI for global impact, we've been working extensively to assess how well the latest crop of state-of-the-art AI models understand low-resource languages, including Kikuyu, Swahili and Kinyarwanda.

We are now hosting:

Not only are we hosting the best African language models, we've also set up detailed evaluations to answer critical questions for developers building voice services in these languages.

Our Evaluation Approach

Our evaluations use real-world questions from agriculture, health, and general conversation—the domains where voice AI can have the greatest impact. Each question was recorded as natural spoken audio to test end-to-end performance in realistic conditions.

We evaluated three key questions:

Which AI architectures best understand common Swahili, Kikuyu, and Kinyarwanda questions? We tested different workflow approaches including chained models with machine translation, fine-tuned ASR paired with GPT/Gemini for translation, and single-model audio-to-audio systems like GPT-realtime.
Do these models understand spoken language well enough for production deployments? We measured how accurately each architecture could process questions and generate expert-level responses.
Are response times fast enough for real-world use? We tracked latency to ensure these solutions could power voice-only services on non-smartphones, which is critical for accessibility in many communities.

Results

Our evals show that fine-tuned ASR combined with GPT-5 / Gemini 2.5 delivered improved accuracy and lower latency across all three languages.

ASR-MT-LLM-MT-TTS

ASR-LLM-TTS

Single Model

Quality

Swahil Audio2English Answer

94%

Jacaranda + GPT5 + Google Trans

100%

Jacaranda + GPT-5

49%

GPT4o realtime beats GPT-realtime 31%

Latency

Swahili Audio2SwahiliAudio

Mean in seconds

6.3

5.99

6.48

Explore the detailed evaluations:

KINYARWANDA

REALTIME: Kinyarwanda Audio2Text Prompt Compare - 30Qs • Gates Foundation • Gooey.AIgooey.ai

SWAHILI:

Top4: Swahili Audio2Text Comparison (11 sept - 30Qs) • Gates Foundation • Gooey.AIgooey.ai

KIKUYU:

Kikuyu Audio2Text Comparison -25Qs • Gates Foundation • Gooey.AIgooey.ai

Other Updated Language Models

In our efforts to create higher accessiblity for AI in the impact sector we are happy to share that we are already hosting - Sealion v4 and Apertus!

Video Workflows

Animate Under-represented Datasets

As part of our Beyond Bias initiative, we're expanding video capabilities to help creators and artists bring visibility to underrepresented communities and datasets. With Gooey.AI you can now:

Bring your own image dataset
Train a Flux Lora custom image model
Use the Lora Model to create images
And finally, animate these images

ZOOM IN to see the Beyond Bias Workflow!

These features emerged from our Beyond Bias workshops, where we identified the need for AI tools that don't just work for everyone, but actively help amplify voices and stories that have been marginalized in AI training data.

Learn more about Beyond Bias workflows in our upcoming section below.

Video Models on Gooey.AI

We are thrilled to release our text-to-video and image-to-video models! Start making high-quality videos with:

Veo3
Wan 2.5
Kling and more!

PRO TIP: It can also generate audio!

Upcoming

Finally, we are excited to announce our upcoming Beyond Bias Prompt-a-thon in Delhi, Pune and Bangalore.

Beyond Bias, a Gooey.AI and Goethe-Institut India partnership, reimagines generative AI through participatory practices, creating inclusive datasets and tools that honor diversity and drive innovation.

Know more about Beyond Bias:

BeyondBias: Making AI more Inclusive | Gooey.AIgooey.ai

Last updated 2 months ago

Was this helpful?