AI Workflow Standards

How interoperable AI APIs and workflows will create billions of AI makers and propel an innovation ecosystem with the best of private and open source AI.

Sean Blagsvedt ([email protected])

Abstract

How does every organization become an AI organization - so they don’t get displaced by their competitors that do? How can we leverage the constant advances among private and open source AI models, so we can continuously deploy the better, cheaper, cleaner or faster ones for any use-case? How do people and organizations discover and apply the hard-won AI lessons of their field’s peers to their own problems?

AI Workflows are a cross-vendor specification of human-readable, standardized steps of LLM prompts, AI models (e.g. LLMs, speech recognition engines), knowledge base documents and function calls paired with use-case specific evaluation datasets. Like HTML, India’s Unified Payment Interface and Kubernetes, AI Workflow Standards will create new technology layers and opportunities, improve market choices and accelerate the deployment, sharing and evaluation of AI solutions. Coupled with Workflow orchestration run-times and discovery platforms, the standard will foster an ever-growing collection of simple, reusable AI workflows and create a level-playing field for AI model makers, hyperscalers and hardware fabricators to provide new functionality and services. One-off AI investments will become new assets for the public to reuse, thereby enhancing the speed of innovation.

A Story of Smallholder Farmers & Shared AI Workflows

In late 2022, the NGO DigitalGreen approached Gooey.AI with a challenge: could the wisdom endowed in 1000s of agricultural training videos, PDFs and FAQs be made available to small-holder farmers via AI? The result was Farmer.CHAT, a WhatsApp bot that understands spoken questions in 7 languages across India, Kenya and Ethiopia and answers back in text and speech, with vetted answers derived from DigitalGreen’s knowledge base. In 2023, Farmer.CHAT was deployed to thousands of small holder farmers with over 35,000 messages exchanged.

The feedback from farmers was very positive and Farmer.CHAT was demonstrated at the 2023 UN General Assembly’s Science Panel. The Guardian covered the story and OpenAI featured it as a case-study. Importantly, the workflow that powered Farmer.CHAT - its LLM instruction prompts, video transcripts, documents and collection of AI settings & models (OpenAI’s GPT4, Meta’s MMS for speech recognition, Vespa.AI for the knowledge base, Google Translate, etc) - was public for others to inspect and modify.

As word spread, more organizations became interested. Opportunity International is an NGO operating in 33 countries with a large focus on helping smallholder farmers improve crop yields and incomes. They too saw the potential of Generative AI to answer farmer questions about agronomy practices and partnered with the Ministry of Agriculture of Malawi to use their 448 page Guide to Agriculture.PDF as the knowledge base for a Chichiwa-speaking WhatsApp bot. The initial rollout was deployed to the agricultural mentors, extension officers and Opportunity’s farmer support agents; it reduced Q&A wait times for smallholder farmers from days to minutes and it too earned recognition from Bloomberg, Devex and the Gates Foundation.

"With (a shared workflow), we built in a day what our internal team had been working on for 3 months.” - Paul Essene, Senior Director Product at Opportunity International.

“We had 3 developers working for months on an agriculture chatbot for Malawi farmers. In an afternoon, we were able to push our authoritative agronomy extension content into (the Gooey.AI copilot workflow) and usability test a hallucination-free, multilingual WhatsApp RAG bot - with page-level citations, built-in speech recognition, analytics and evaluation.”

In short, Opportunity was able to almost immediately expand beyond the work of Farmer.CHAT and test its utility on a new population - focusing not on AI code, but on the specific knowledge sets (the Ministry of Agriculture documents), their users’ language needs (which AI models understand Chichiwa best?) and the evaluation criteria relevant to their needs.

Beyond Impact to Expert Systems

The ability of one organization to quickly build from the AI workflow of another isn’t limited to agriculture or the impact space. In the US, there’s an acute shortage of skilled HVAC workers, especially among senior technicians who have retired in droves. A large, private equity-backed HVAC service provider also saw the Farmer.CHAT press and wondered if AI could be a mentor to their plumbers, furnace and air conditioning repair personnel. The technical solution again leveraged chatbot workflows. In a matter of weeks, we replaced the training PDFs of farming guides with 1000s of scanned HVAC manuals and videos, swapped Hindi for Spanish speech recognition models, updated the LLMs prompts (that define the personality of the chatbot) and delivered a Slack chatbot that could answer virtually any question on how to repair every furnace sold in America (with diagram & table-based citations and links to relevant videos).

How can we enable every industry to spread AI innovation this quickly? We think the answer lies in thinking of Public AI as interoperable AI workflow standards, facilitating a thriving, open AI market.

Public AI as Ideal Market Design

Digital Public Infrastructure projects and Public AI are ultimately in service of an ideal world state. An ideal AI marketplace should have the following characteristics:

AI is inexpensive & measurably valuable to all organizations (especially less technical ones)
Innovations spread quickly across industries
Benefits & investments are broadly distributed
Low switching costs for customers + low entry barriers for AI model makers & hyperscalers providing AI services
Every new model - be it private or open source - enhances a constantly improving ecosystem
Negative societal externalities - eg. the climate impact of computing data centers - are reversed

Our Approach

Our approach in this paper focuses on the power of technical standards to achieve desired societal outcomes. We borrow liberally from past successes:

Proselytize standard protocols to create a healthy market (as India’s UPI and Kubernetes did)
Encourage open source and private AI to compete on performance, speed, cost, security and environmental impact.
Create accessible, high level abstractions (as Web’s HTML & Mozilla’s “View Source” did) to enable more actors to build AI solutions, creating billions of AI workflow tinkerers (vs today’s millions of programmers).

How AI Apps are Built Today vs With Standards

Today, OpenAI is the dominant LLM AI vendor and hence, most AI applications call their text-completion GPT API (with GPT4o being their top model at the time of this writing). The application calls OpenAI with a few default settings and importantly their text prompt as inputs (“What’s the capital of France?”) and OpenAI outputs text (“Paris”). Most competing open source or private LLM vendors have already implemented OpenAI’s GPT interface, making it easy to “hot-swap” OpenAI’s LLM for their own. Hence, AI Standards would simply formalize a practice that is already occurring among LLMs, codifying the interface with which an application communicates with a text LLM model.

Standardizing Evaluation

We see new AI models out-doing each other constantly, with new capabilities released weekly from private, public and open source participants. Every application developer thus faces the challenge of continuous AI model evaluation - e.g. is the latest model from Y company better, given my particular performance, cost, speed, security and environmental preferences?

We propose that evaluations should be standardized into datasets of inputs and golden outputs - i.e. the ideal answers - coupled with an evaluation AI prompt to determine how close an AI-created output is to the golden output.

Whenever a new AI model is released, any organization using an AI workflow can simply re-run the standardized evaluation to determine the golden output similarity, speed, cost and carbon usage of the updated model and assess whether it is a better fit for their use-case.

Standard Interfaces for more AI Modalities

We can now generalize the concepts of standardized interfaces from just LLMs to other major AI areas (or modalities), given that each modality largely shares the same inputs and outputs. E.g. LLMs take in text and are asked to continue it. Speech recognition models take in an audio file (and optionally a language code) and output a transcription. It is then easy to imagine applications not speaking directly to interfaces defined by one company but communicating via AI Standard APIs to almost every model via common interfaces for each modality.

Let’s Compose Into Workflows!

We now have the building blocks to compose Standard AI API calls and Evaluations into AI Workflows, which consist of 3 main parts:

Inputs and Outputs - What the recipe expects to take in - text, audio, etc - and what it’s expected to output.
Steps - A list of instruction prompts + settings to abstracted AI modalities - e.g. LLMs, VectorDBs, text-to-speech models - using the inputs and ultimately returning the recipe’s outputs.
Evaluation - The golden dataset and evaluation prompt to determine if any model, prompt or setting change improves performance, cost, speed or environmental impact.

Here’s a simple example of Farmer.CHAT as a retrieval-augmented chatbot with 2 steps;

Step1 to run VectorDB search using the file “Agriculture_guide.PDF” as the knowledge base source, and
Step2 to summarize the results of the search with an LLM prompt to answer the input question.

The AI Workflow also contains the evaluation dataset and prompt.

We’ve now reduced our AI use-case down to its essence - its prompts, knowledge base documents and how we should judge its performance as the underlying models are constantly improved. With every new version of GPT, LLaMA or a VectorDB, we can simply re-run our evaluation to determine if those new AI components yield better, cheaper, faster or lower environmental impact results.

Workflows : Runtimes as HTML : Browsers

The Workflow itself would be executed on competing runtimes such as OpenAI GPT Builder, Gooey.AI, Dify, Anthropic’s Claude builder or any software system that supports importing and running the AI Workflow standard. These runtimes can then connect the Workflow to communication platforms such WhatsApp, Slack or telephony systems so that end users can easily interact with it as a chatbot.

Expected Benefits

HTML created the layers of the tech ecosystem that aided the productivity growth of the 90s and it is our hope that AI Workflows can have a similar effect.

AI is inexpensive & measurably valuable to all organizations (especially less technical ones)
1. Widely used ecosystem of private & open source AI.
2. Organizations can prototype, test, measure impact and iterate faster at much lower cost via vs code.
Innovations spread quickly across industries
1. More shared innovation via millions of Shared AI Workflows
2. Billions of AI Workflow makers working to make organizations of all types and sizes better with AI.
Benefits & investments are broadly distributed.
1. AI Hyperscalers - who host AI models and can act as Workflow runtimes - can operate in every state or country (vs calling a handful of private AI companies with data centers located in just a few dominant cities)
Low switching costs for customers + low entry barriers for AI model makers, hyperscalers and companies providing AI services
Every new model - be it private or open source - enhances a constantly improving ecosystem.
Negative societal externalities - eg. the climate impact of computing data centers - are reversed
1. Transparency in climate impact of every AI Workflow and their downstream AI model calls should raise awareness and price AI’s climate impact

Stakeholders

We believe the AI Workflow Standard has significant benefits to particular stakeholders as well:

Philanthropies such as the Rockefeller and Gates Foundations already seek to turn their investments into assets for other charities they fund (e.g. Global Access provision). Hence, we encourage them to mandate that their grantees publish AI related work as reusable Workflows (e.g. the LLM prompts, models, datasets and evaluation methods used).
Big Tech such as Nvidia, AWS, Microsoft, AMD and Meta have invested billions in hardware and data centers and will recoup that investment if AI solution demand and expertise increases. This would be expected if AI Workflows (and Workflow makers) become commonplace.
Consultancies (e.g. KPMG) benefit from AI Workflows because it gives them greater AI vendor independence and reusability across projects.
Non-US governments and Tech Companies may benefit because AI Workflow Standards aid commoditization of AI modalities and hence may give local providers a better chance to compete.

Next Steps:

Proselytize and gather feedback from
1. Standards bodies, industry consortium and governments
2. Major companies including AWS, Nvidia, Meta, OpenAI, Microsoft, AMD, Anthropic and Open source model makers
Define specifics of JSON/XML standard
Dive into the tech weeds - what’s the specific API? How do the standards evolve as AI develops (ala standard bodies)?

Acknowledgements and Thanks

Rockefeller Bellagio Public Resource for AI meetup (June 2024)
AIPalace.org Public AI meetup (July 2024)
Tanuj Bhojwani - PeoplePlus.AI
Pramod Varma & Jagadish Babu - EkStep Foundation
Brandon Jackson - PublicAI.Network
Elias Wolfberg - Nvidia
Gary A. Bolles - Singularity University

Appendix: A Short History of Technical Standards

Kubernetes

In 2013, AWS was quickly becoming the dominant vendor in the cloud services provider space. As companies realized the benefits of cloud hosting (vs buying and managing servers themselves), AWS was the clear winner in the category. Google, Microsoft and others vying for this business quickly realized that if they could create a new standard to describe cloud server deployment - what would be become Kubernetes - then customers could define their server topology as a standard, interoperable configuration file and then port that configuration to any cloud provider that supported Kubernetes. Eleven years later, AWS’s dominenance has been tamed, multiple hyperscalers are competitive and the Kn8 standard is ubiquitous.

This example is instructive to our current case AI dominance from OpenAI/Microsoft in 2024. They are the early winner in the LLM space, with many governments, competitors and organizations genuinely concerned that the most important innovation of this generation will be captured by a couple American companies.

A key lesson is that Kubernetes won because powerful but non-dominant players pushed for it as an industry standard, and if we want AI Workflow Standards to succeed, we’ll need to woo powerful actors such as Google, AWS, Nvidia, Anthropic, Meta etc to support it.

HTML

The HTML standard created the web as we know it today with several important features that we seek to emulate.

First, HTML created new levels of abstractions.

Browsers competed in their ability to render HTML pages fast, with a competitive UI and support for the latest standards.
HTML authors could “View Source” on any page to understand the page’s layout and hence, learn by tinkering with the source code for any interesting page.
Web servers then competed on security, scalability, pricing, etc.

These new levels of open abstraction allowed new vendors and new categories of work to manifest in a way that would have never been possible with the alternatives of the time such as American Online. These standards led to the entire web industry and arguably the increased global productivity wave of the 1990s. Additionally, the HTML standard was voluntarily and evolved as web browsers and servers competed by adding additional functionality.

India’s UPI (Unified Payment Interface)

After the incredibly successful deployment of Aadhaar, which gave 1 billion residents of India verifiable unique identifiers, the Indian government and think tanks such as the EkStep foundation sought to create a digital money infrastructure that would enable business-to-business and person-to-person payments in an open marketplace, with high-volumes, zero-to-low transaction fees and without creating dominant private vendors such as Visa and Mastercard in the US. Hence, the Universal Payments Interface standard was created, wherein any mobile wallet, bank and company could register to send and receive payments and create digital wallets to manage a customer’s money. As of Nov 2022, it has over 300M daily users.

About Gooey.AI

Gooey.AI is a low-code platform of shared workflows that leverage the best of open source and private AI. The company was founded in 2022, has over 500,000 users and its work has been featured in the Guardian and demonstrated at the UN General Assembly. Clients include the Gates Foundation, PeoplePlus.AI, the Ekstep Foundation, SafariComm, Fandom.com and Zephyr.

Why?

“If you get into this (AI) space, the most important thing is that you share what you are learning.”

Simon Willison, Creator of Django from Catching up on the weird world of LLMs

We must improve the innovation infrastructure of human beings, to survive the climate crisis and to solve virtually any problem we imagine. In letting each of us more efficiently build on the work of each other, we increase the leverage of our collective efforts. This is the theory of change we employ and how we hope to accelerate progress by improving the innovation infrastructure of all organizations - including development organizations - who wish to leverage AI.

Why now?

“It’s not that AI will replace lawyers; it’s that the lawyers who use AI will replace those that don’t.” - Superlegal

“Every business will become an AI business.” -Satya Nadella

It is clear to us and many others that the productivity enhancements made possible first with software - with its feature that humans can reuse and modify prior investments at near- zero marginal cost - and now modern AI tools such as OpenAI’s ChatGPT will likely cause a transformation in how most processes in organizations function. We go further with the belief that the SuperLegal adage above will apply to almost every organization and job function; namely that those organizations and people that best leverage AI - as a super-set of all reusable collective human work and knowledge - will outperform those that do not. As Bill Gates stated in his April 2023 memo:

The development of AI is as fundamental as the creation of the microprocessor, the personal computer, the Internet, and the mobile phone. It will change the way people work, learn, travel, get health care, and communicate with each other. Entire industries will reorient around it. Businesses will distinguish themselves by how well they use it.

But how can we help people and organizations make this transition to a hyper-competitive and productive world? What tools do we need when “thinking” jobs are ones for AI prompt writers - trying to wrangle the AI to our desires, and/or API stitchers - connecting non-obvious or custom sets of data and functionality to build novel and useful new things? How do we specifically help organizations - such as development organizations - learn from each other’s investments? We think AI Workflows are part of the answer to these questions.

Last updated 11 months ago

Was this helpful?