Why Many AI Pilots Stall and What We Can Do Differently is a joint perspective from FSG and Wadhwani AI Global
Artificial intelligence is rapidly becoming part of development programming across the Global South. Governments and development partners are experimenting with AI tools that help farmers access crop advice, assist community health workers with diagnosis, personalize learning for students, and improve targeting of social protection programs.
The promise is compelling. In contexts where expertise and resources are scarce, AI has the potential to expand what frontline workers and institutions can do.
Yet many of these initiatives struggle once they move beyond pilot stages.
The reason is often misunderstood. When AI pilots stall, the instinct is to blame the technology – data quality, model accuracy, or infrastructure limitations. But our experience across AI interventions in agriculture, health, and education suggests something different.
In many cases, the technology works. What fails is the fit between the technology and the realities of how people live and work.
When AI Meets Reality
Consider an agriculture advisory initiative deployed across parts of Africa and India. The system used AI models to diagnose crop conditions and recommend actions to farmers. Governments were excited about scaling quickly, and donors saw an opportunity to deliver better advice at lower cost.
But once the tool reached real users, challenges emerged.
Many farmers struggled with text-heavy interfaces written in non-local languages. Connectivity was unreliable, preventing photo uploads needed for diagnosis. In some households, women farmers could not access the shared phone at the moment they needed advice.
Even when farmers received recommendations, trust remained fragile. When the AI misidentified crops or suggested actions that did not match local conditions, farmers rarely followed the advice directly. Instead, they checked with peers or community members before making decisions.
The problem was not the model’s performance alone. It was that the system had been designed without a deep understanding of how farmers actually accessed technology, made decisions, and trusted information in their daily lives.
Now consider a different example.
In Kenya, a community-led mental health initiative sought to expand access to care in areas where formal services were limited. Instead of starting with technology, the program first strengthened a network of trained volunteers and professionals who served as first responders in their communities.
Only once that human system was functioning reliably did the team introduce AI.
The technology helped match users to appropriate therapists, automate reporting, and identify stress markers from conversation patterns. Crucially, frontline workers continued to assess cases and make referral decisions. AI acted as decision support rather than replacing human judgment.
Because AI was embedded within workflows that workers already trusted, the tool gained legitimacy quickly. Users were more willing to engage with it, and adoption grew organically.
In this case, the success of the AI system came not from technological sophistication alone, but from its alignment with existing relationships, workflows, and trust structures.
From Lessons to Infrastructure
What would it take to address this gap more systematically?
One emerging idea is to move beyond isolated case studies and build shared tools that help decision-makers assess whether AI is likely to work in a given context before significant resources are invested.
Wadhwani AI Global and FSG are exploring the development of a Behavioural Science Playbook and Testing Sandbox for AI in Global Development.
The aim is to translate lessons from real-world deployments into practical tools that governments, funders, and implementers can use to make better decisions about AI.
The Playbook would synthesize behavioural insights from field research into guidance on questions such as:
- Where can AI realistically add value in frontline workflows?
- How should AI systems be designed to support human decision-making?
- What signals indicate that an AI pilot is likely to succeed or fail?
Alongside this, a digital testing sandbox would allow teams to simulate how AI tools might perform under real-world constraints before deployment.
Rather than evaluating systems solely on technical performance, the sandbox would examine questions such as:
- Can users actually access and navigate the tool?
- Do recommendations change decisions or create confusion?
- Where does trust in the system break down?
- How do constraints such as connectivity, shared devices, or language barriers affect usability?
By combining behavioural research with structured testing environments, such an approach could help development organizations identify promising AI opportunities earlier and avoid investing in tools that are unlikely to work in practice.
Looking Ahead
Over the coming months, Wadhwani AI Global and FSG will be engaging with governments, implementers, and funders to further develop this Playbook and Sandbox approach.
If AI is to fulfil its potential in global development, the conversation must move beyond models and infrastructure to a deeper understanding of how humans and AI systems interact in real-world settings.
Because in the end, the success of AI will depend not only on what the technology can do but on whether people can and will use it.