Ai

Degraded Mode: How I Build AI Agent Systems That Don't Fall Over

The disciplines that keep large, life-or-death systems reliable — protocols, redundancy, degraded-mode operation, and leading people — are the same ones I use to build production AI agent systems at Genesis Labs.

People who learn I build AI systems for a living sometimes expect the origin story to start with a model release or a clever prompt. It doesn’t. It starts with the question that has organized my whole working life: what happens when this fails?

I spent twenty-one years as a Machinist’s Mate keeping Navy ships running — propulsion, auxiliary equipment, the systems that, if they quit, mean you don’t complete the mission. Before that and after it, I worked the same problem in different clothes: commissioning building energy systems, then running maintenance and reliability programs across a network of manufacturing plants. Today I build production multi-agent AI systems and run Genesis Labs, a studio making faith-centered software for families and the church, where Shprd is a family operating system real households depend on.

Different domains, one craft. Because the disciplines that make a large, life-or-death system reliable turn out to be the exact disciplines that make an AI agent system reliable. That’s not a metaphor I reach for to sound interesting. It’s the engineering reality I work from every day, and it’s why I build the way I do.

A model is a component, not a system

The single most expensive mistake I see in AI right now is treating a capable model as if it were a finished system. A model is a brilliant, fast, occasionally unpredictable component. On a ship, a turbine is the same: enormously powerful and completely useless on its own. Steam from the boilers drives it, cooling water keeps it from destroying itself, electrical and control systems govern the whole dance. The value was never in any single part. It was in the orchestration.

So when I architect an agent system, I’m not really “prompting a model.” I’m building the rest of the ship around it — the context that flows in, the tools it can reach for, the guardrails that constrain it, the evals that tell me whether it’s actually doing the job. The model is the most visible piece and the least of my worries. The system around it is the work. This is why I stay deliberately vendor-agnostic across the stack: components get swapped, deprecated, and outclassed on a quarterly basis. The disciplines around them are what persist.

Reliability is a property of the whole, under stress

Anyone can demo an agent that works on a clean input on a good day. That demo tells you almost nothing, the same way a turbine spinning happily in port tells you nothing about how it behaves in a following sea. Reliability is not a feature you add at the end. It’s a property of how the whole system behaves when conditions degrade — and conditions always degrade.

A few principles I carry directly from hard systems into how I build agents:

Design for degraded mode. On a ship you assume things will break and you plan how to keep operating anyway. Agent systems need the same: when a tool call fails, when a model returns garbage, when an upstream service is down, the system should fall back gracefully, retry sensibly, and never silently corrupt state. The interesting engineering lives in the failure paths, not the happy path.

Build in redundancy and checks. Critical functions need backups and independent verification. I don’t trust a single agent’s output any more than I’d trust a single gauge on a critical system — I cross-check it. Some of the most important work I do is adversarial: a second agent whose entire job is to try to break the first one’s output before it reaches anything that matters.

Make protocols, not heroics. The most durable thing I ever built in uniform wasn’t a repair — it was a qualification-tracking system that lifted readiness across a command and then got adopted by a larger one, because it kept working whether I was in the room or not. Agent systems are the same. A clever one-off prompt is a hero with a wrench. A well-specified harness — explicit contracts, structured context, guardrails, evals — is a protocol that holds up across thousands of runs and people who aren’t me.

If it isn’t observed, it didn’t happen. You cannot operate what you cannot see. I learned that staring at gauges, and relearned it building dashboards over messy enterprise data where the entire problem was opacity at scale — issues stayed hidden until they were crises. Every serious agent system I build is instrumented so I know what it did, why, and whether it’s drifting. Observability isn’t a nice-to-have bolted on later; it’s load-bearing.

Specialists, coordinated — not one model doing everything

There’s a reason damage control doesn’t put one sailor on every casualty. You have teams with defined responsibilities, coordinated through a command structure, working in parallel toward one objective. Trying to make a single agent assess security while checking performance while maintaining architecture while updating docs is the software equivalent of running a whole engineering plant off one overwhelmed watchstander. It doesn’t scale, and it fails in ways you can’t diagnose.

So I build with specialization and coordination: focused agents with narrow, well-defined jobs and the minimum context and tools they need, orchestrated toward the goal. Coordinated parallelism, not chaos. The hard part is rarely any individual agent — it’s the orchestration layer and the seams between them, exactly like the hard part of running a plant was never one machine but the interfaces where systems hand off to each other.

Why this matters more, not less, with faith-centered software

At Genesis Labs the stakes make all of this concrete. We’re building tools families actually rely on, in a domain where trust isn’t a growth metric — it’s the entire product. A novelty AI toy can be flaky and charming. Software a household leans on cannot. That raises my reliability bar to roughly where it sat when failure meant a ship dead in the water, and it’s why I’m uninterested in impressive demos and deeply interested in systems that behave correctly on the worst day, not the best one.

Twenty-one years ago I wasn’t planning to build AI agents. But the through-line is obvious in hindsight: see the whole system, coordinate specialists, design for failure, lead the people and processes around the technology, and build for reliability under pressure. The domain changed. The craft didn’t. If you come from a world where things break and people get hurt when systems aren’t engineered honestly, you already know more about building good AI than most of the discourse will tell you. Don’t underestimate what that’s worth.