The Hidden Work Behind a 20% AI Triage Win

What we learned about AI reliability after the pilot started working.

A mid-size service provider came to us with a problem most client-facing teams will recognize. About 1,000 inbound messages every day. Each one needing a human to read it, judge tone, gauge urgency, and decide who responds and how fast.

That's exactly the kind of work AI is built for. It's also exactly the kind of work where betting everything on one AI vendor carries business risk most leadership teams aren't willing to accept. Cost, reliability, or both.

We built them an AI triage layer that reads every incoming message, classifies it by topic and urgency, and flags tone signals: frustrated, satisfied, escalating. Their team still owns every response. They just stopped wasting hours sorting the easy ones from the hard ones.

The number that matters: roughly 20% fewer staff hours spent on triage. On 1,000 messages a day, that's a material shift in where people's time goes. Responsiveness improved. Internal response-time targets that used to slip on busy days now hold.

But here's the part worth writing about. The 20% isn't really the story. The story is what we built underneath to make sure the 20% holds when something goes wrong, and that's the part most pilots get wrong.

Why We Didn't Just Pick One AI Vendor and Call It Done

The instinct, when you're standing up your first real AI workflow, is to pick a vendor, sign up, and build around them. That's how most pilots start. It's also how those same risks (cost, reliability, dependency) end up baked into the workflow.

Three things tend to go wrong:

You overpay. The most capable AI models are also the most expensive. If you route every message through a premium model, including the routine, predictable ones, you're paying top rates for work that a far cheaper model handles just as well. At volume, this adds up fast.

You inherit their bad days. AI providers go down, throttle usage, or change pricing without much warning. If your entire workflow depends on one vendor's availability, a two-hour outage at 2 PM on a Tuesday becomes your operations team's emergency.

You get stuck. The AI landscape is moving fast. New models come out regularly that are dramatically better, or cheaper, for specific tasks. If your workflow is built tightly around one vendor, switching means a significant rewrite. Most teams don't bother, so they stay on whatever they started with.

The solution is a layer that sits between your operations and the AI providers: one that routes each request to the right model at the right cost, and reroutes automatically if something goes wrong. For this client, we built that layer using an open-source tool called LiteLLM. (We've written before about why one AI tool can't do every job; this is the same idea applied to production workflows, not just developer tooling.)

Think of it less as a single AI and more as a general contractor for AI requests: it knows which specialist to call for which job, swaps in another sub if the first one isn't available, and keeps the cost in line across the whole project.

Paying the Right Amount for Each Message

Not every message is the same. Most are routine: a status question, a billing inquiry, a standard request. A smaller portion are genuinely complex, with emotion, high stakes, or context that needs careful interpretation.

We configured the system to treat them differently. Routine messages go to a fast, inexpensive model, which is more than good enough for "what category is this and how urgent is it?" The harder ones, where tone and context really matter, go to a more capable model. If the system isn't confident about a classification, it escalates automatically before a human ever sees it.

The result: the client stopped paying premium rates for routine work. Cost per message came down significantly, and the quality of classification on the hard cases actually went up, because those messages were finally getting the right level of attention.

What Happens When the AI You're Relying On Goes Down

We built the system with automatic backup providers. If the primary AI service is unavailable (outage, slowdown, capacity limit), the system switches to the next option without anyone on the client's team noticing.

In the first weeks after launch, there were a handful of provider issues that would have taken the workflow offline entirely if we hadn't designed for those risks up front. Instead, the backup kicked in and the team kept working. They found out afterward, from the logs, not from a gap in service.

This is the thing that matters most to a leadership team making a real commitment to AI in a client-facing process. A pilot that works most of the time is a demo. A system that keeps running when a vendor has a bad afternoon is production-ready.

The Risk That Doesn't Stop at the AI Vendor

LiteLLM, like any active open-source project, has had its share of security findings, including a supply chain vulnerability disclosed during the time we've been running this workflow. The vulnerable version never reached this client's production environment, because the same DevOps discipline we apply to every engagement caught it before it could: pinned dependencies, controlled builds, vulnerability scanning on every release, a documented patching cadence. Picking the right tool is only part of the job. Building, patching, and monitoring what you've built is the rest.

Worried about supply chain risk in your AI or DevOps process?

We help mid-market organizations harden the software supply chain underneath their AI and custom applications: dependency management, vulnerability scanning, and incident response when a vendor publishes a CVE.

Talk to us about DevOps and supply chain security →

What the 20% Actually Means for the Team

On 1,000 messages a day, 20% fewer review hours is the difference between a team that's always catching up and one that's actually ahead. They're answering clients faster. Escalations now happen early in the workflow, before a human ever sees the message.

Nobody lost their job. The hours just went somewhere better: into the cases that actually need human judgment, which is where the team adds the most value anyway.

Three Things Every Leader Should Know Before Going Further With AI

Most organizations we work with already have at least one AI workflow running. Very few have asked what happens when it breaks. Here's what we'd tell any leadership team thinking about the next move.

1. Don't bet your workflow on a single vendor.
The proof of concept may have run on one provider. Production shouldn't. Build for flexibility up front. That's the difference between adapting as the market moves and being stuck with whatever you started with, and it's what makes automatic failover possible: when a provider has an outage, hits a rate limit, or changes its API, the workflow keeps running because the backup is already in place.

2. Pay for what each request actually needs.
Routine work doesn't need premium models. If every message is going through the most expensive option you have, you're leaving real money on the table. Cheap models are good enough for most things, and route the hard cases up only when they warrant it.

3. Keep your humans on the cases that need humans.
The point of automating triage is to remove the work that doesn't require human judgment, so the work that does gets the team's full attention. Measure that, not just the cost savings.

Don't Wait Until the Pilot Becomes the Bottleneck

If you're using AI for anything that touches client experience (message triage, document review, knowledge search, internal support), here are a few questions worth sitting with:

Are we paying the same rate for every request, even the simple ones?
What actually happens to this workflow if our AI vendor goes down for two hours?
If a better or cheaper option came out tomorrow, how hard would it be to switch?

If those answers are uncomfortable, you usually don't need a different AI vendor. You need a layer around the one you have that makes your choice of vendor something you can swap.

That's the kind of work we help mid-market organizations think through as part of their vCIO and AI strategy engagements: not just standing up a pilot, but making sure the pilot can actually carry weight in production.

Make your AI pilot production-ready.

If you're running AI in a client-facing workflow and don't yet have the routing, failover, and supply-chain discipline underneath it, we can help.

Talk to us about AI in your operations arrow_forward

Springthrough provides Managed IT Services, vCIO advisory, and custom AI solutions to mid-market organizations across the Midwest. Our team is based in Grand Rapids, Michigan.

Share this post