Home
/
Resources
/
Blog
/
Why We Built Guide as a Multi-Agent System Instead of One Big Agent

Spring Health Solutions

Why We Built Guide as a Multi-Agent System Instead of One Big Agent

What Spring Health learned about routing, scope, and safety while building AI for mental healthcare

May 19, 2026

min read

Written by

Ondrej Kopecky and Farhan Nomani

Clinically reviewed by

Copied!

Jump to section

This is some text inside of a div block.

A member starts a conversation with Guide in the middle of the night and asks two very different questions in the same conversation:

How many sessions do I have left this year?
I don’t know if I can do this anymore.

Those messages sit next to each other in the same chat window. They are not the same systems problem.

One is operational. One may carry clinical risk. They require different context, different handling, and different boundaries. If the same model tries to do both jobs equally well in the same way, one of them usually gets answered badly.

That constraint shaped how we built Guide. We did not end up with one general-purpose agent. We built a multi-agent system with narrower responsibilities, explicit routing, and tighter scope boundaries, because mental healthcare breaks down quickly when a system tries to be broadly helpful instead of specifically reliable.

Guide is Spring Health's AI that supports people across every stage of their mental health journey. For us, the challenge was never just to make that experience feel smart. It was to make it useful, bounded, and trustworthy in the moments that matter most.

The problem we were trying to solve

One of the first things we focused on was what happens between sessions. Those moments matter more than product teams often assume. A member may be:

Trying to find a provider and feel exhausted by the search.
Preparing for a first appointment and not knowing how to describe what they need.
Wanting help following through on a care plan.
Needing support at the moment.
Simply needing to know what to do next.

From a distance, those can look like one category: support. In practice, they are different jobs with different requirements.

That was one of our earliest lessons when building Guide. If you ask a single agent to handle provider matching, scheduling, emotional support, follow-through, and service questions all at once, quality degrades across the board. In early testing, we saw that directly. A member could be working through something emotionally difficult and then ask a benefits or scheduling question in the same thread. The system could sometimes hold the emotional moment, but it struggled to stay precise on the operational one. Responses became broader. The system hedged. The experience flattened right when it needed to feel most dependable.

So we made a deliberate choice: Start narrow, and separate jobs that looked similar from the outside but required different behavior from the system.

How Guide is built

Guide is built around an orchestration layer rather than a single all-purpose model.

At the center of the system is Guide itself, which handles intent detection and routing. Its job is to understand what a member is trying to do and direct that request to the right kind of help. Around that layer, we built narrower agents with clearer responsibilities.

The booking agent handles follow-up appointments for members already in care.

The provider-matching agent supports members who are still trying to find the right fit.

The in-the-moment support agent helps members navigate hard moments between sessions by offering grounded support and guiding them to the right next step, whether that is a resource, scheduling help, or connection to a clinician.

It is intentionally bounded: Its job is not to act like a therapist, make clinical decisions, or replace human care. The customer-support agent handles operational questions. A generalist path handles ambiguous input by clarifying instead of guessing.

That structure gave us something a single agent could not: Specialization without fragmentation.

Why we separated routing from response generation

A lot of AI products start with response generation and work backward from there. We took the opposite approach.

The first question is not, “What should the model say?” The first question is, “What job is the system being asked to do?”

That distinction matters because the care journey includes different kinds of needs, and those needs require different kinds of system behavior. Finding the right provider is a different problem from booking a follow-up. Reinforcing a care plan is a different problem from answering a benefits question. In-the-moment support is a different problem from helping someone figure out what kind of care they need next.

Separating routing from response generation made the rest of the system more legible. It let us narrow the scope of each agent, apply the right context to the right job, and make ambiguity an explicit state the system can handle instead of a failure mode it tries to bluff its way through.

When Guide is not confident about intent, it asks a clarifying question. That behavior sounds simple. In practice, it is one of the most important product decisions in the system.

Why one agent was the wrong abstraction

We could have kept pushing on a single, broader agent. We considered it. The reason we moved away from it is that generality looked elegant in theory and unreliable in practice.

A single agent creates pressure to do too many things with one prompt surface, one context window, and one set of behavioral instructions. That can work in a demo. It gets weaker when real member needs start colliding in the same conversation.

We found that the right abstraction was not one agent with more instructions. It was a system of narrower agents with clearer boundaries. That trade-off cost us orchestration complexity. It was worth paying. In mental healthcare, precision at the handoff matters more than simplicity in the architecture diagram.

Why scope boundaries are part of the product

The in-the-moment support agent is a good example of what this principle looks like in practice.

Its job is not to act like a therapist, make clinical decisions, or replace human care. Its job is to respond in a bounded, supportive way and help guide someone toward an appropriate next step, whether that is a coping resource, scheduling support, or connection to a clinician.

That boundary is part of the design.

The same is true across the system. Booking should feel operational. Matching should feel precise. Clarification should reduce ambiguity instead of hiding it. Safety-sensitive cases should not be treated like generic support conversations. Each role gets a narrower job because the member experience becomes more dependable when the system knows what it is for, and what it is not for.

What makes this work is not just the model, but the system around it

A lot of what the market calls mental health AI starts with a general-purpose model and a chat interface. Our view has been different.

The model matters. But the model is only one part of the system. The architecture around it matters just as much:

Routing
Context
Guardrails
Oversight
The connection to actual care

That is why Guide is not a standalone tool. It operates inside the same platform that connects members, providers, care plans, and outcomes. It can help someone find care, prepare for it, stay engaged between sessions, and keep moving forward without losing the thread each time something changes.

That continuity is the point.

How we know it works

We treat evaluation as part of the system, not as a last-mile check.

That means running continuous conversation-quality evals, testing model changes against those evals before promotion, and using failures to improve routing, boundaries, and handoff behavior rather than just tuning for a better-looking answer.

Safety is the first test any AI we put in front of members has to pass. It is also why Spring Health co-developed VERA-MH: the first open-source AI safety benchmark for mental health. We ran Guide against VERA-MH before we trusted it in member-facing contexts. Our first score was 76. VERA-MH pointed us at a specific weakness: The moments between detecting risk and the human handoff.

After addressing that weakness, the score improved to 82. We publish both numbers because the iteration is the point.

Spring Health was founded at Yale in 2016 as an AI company, and a decade of outcomes work shaped how we think about this layer.

What Guide does not do

Guide does not diagnose members. It does not replace clinician judgment. It does not turn every conversation into a clinical interaction, and it is not designed to be a substitute for human care.

It is designed to help members navigate care, stay engaged, and keep making progress as life changes around them.

That kind of usefulness only matters if it comes with boundaries. For us, safety is not a wrapper around the product. It is part of the product. That means input-side controls for recognizing when a conversation needs different handling, output-side safeguards that keep responses within role, and clear escalation paths when a bounded AI system should stop and a clinician should take over.

In mental healthcare, helpful is not a sufficient bar. The system also has to be bounded, reliable, and designed for trust.

What we learned along the way

The main lesson from building Guide is that the right architecture followed from the problem.

We started with a member journey that broke in different ways at different moments, and we built separate systems to solve those distinct problems.

That gave us something more valuable than a design principle. It gave us operational evidence. Before Guide became a unified surface, we had already processed more than 35,000 conversations through Spring-built experiences like Guided Intake and AI In-the-Moment.

On top of that, we processed more than 160,000 conversations through our customer-support deflection agent. Across more than 200,000 conversations, the pattern became clear: People did not want to decide which tool to use before they could get help. They wanted one place to go, and they wanted the system to understand the problem and route them to the right next step.

That insight shaped the architecture. It meant treating routing, scope, and handoff quality as core product decisions rather than implementation details. The question stopped being, “How do we make one model do more?” It became, “How do we make the system reliably direct each member to the right kind of help?”

What’s next

This work is still evolving. If you’re interested in building clinically grounded AI systems for mental healthcare, Spring Health is hiring across engineering, applied AI, and platform roles.

Copied!

About the Author

Ondrej Kopecky and Farhan Nomani

Farhan Nomani is Director of Product and Ondrej Kopecky is Product Lead at Spring Health, where they focus on AI product strategy and design for mental healthcare.

About the Author

Book with an available provider now

Find a therapist who speaks your language, shares your background, or specializes in whatever you’re going through, including trauma, anxiety, and ADHD.

Find your match