A solitary lighthouse standing in dense ocean fog, its beam quietly angled off into the mist, evoking subtle misalignment, autonomous guidance, and quiet uncertainty.

May 26, 2026

Designing the Failure Script

A colleague at another company recently forwarded me an email their AI assistant had summarized before they read it.

The assistant interpreted the sender’s tone as passive-aggressive. It wasn’t wildly wrong, just wrong enough to change the trajectory of the conversation. A rushed request for updates suddenly read like frustration instead of urgency.

That interpretation shaped everything that followed.

The project lead escalated communication. HR was looped in later “just to be safe.” By the end of the week, a normal production bottleneck had acquired emotional weight nobody originally intended.

There was no obvious failure point. The system nudged the situation in the wrong direction early enough that every downstream action inherited the mistake.

I keep thinking about that story because it feels increasingly representative of how AI systems fail. Not catastrophically. Not visibly. Slightly off-course, in ways that compound. And I’m not sure the design industry has fully adjusted to the implications yet.

For years, most UX failures were easy to locate. A broken flow. A confusing navigation structure. A missing affordance. The user hit friction, recognized it, and either recovered or abandoned the task.

Autonomous systems create a different category of failure entirely.

The problem isn’t that the output is wrong. The problem is that the system acts too early on an assumption that hasn’t been verified. That shifts the burden on the user. Instead of deciding what to do, users increasingly have to inspect what the system already decided on their behalf. The interaction becomes less directive and more forensic.

You can see this everywhere now: AI summaries that subtly shift tone, scheduling agents that rearrange priorities incorrectly, writing tools that flatten nuance, image systems that “fix” things nobody asked to fix. Individually, these moments feel minor. But once systems begin chaining actions together, small interpretive errors stop behaving like isolated mistakes. They become operational drift.

Drift is harder to detect than failure. A broken product announces itself. Drift often looks like the product still working.

Most current AI UX patterns still treat these situations like traditional software errors. They optimize for prevention, correction, and explanation. But the actual design challenge is more uncomfortable than that.

The real question is how much initiative a system should take before a human re-enters the loop.

That tension matters because every obvious solution introduces a tradeoff. More preview slows the experience until autonomy loses its value. More interruption turns the product into a permission-request machine. More reversibility weakens confidence because everything starts feeling provisional. More explanation creates cognitive noise that users eventually stop reading. Designing autonomous systems feels qualitatively different from designing conventional software for exactly this reason. The work is less about flows and more about calibration, balancing initiative against certainty, with no setting that’s universally right.

This is where a lot of current AI product design still feels immature.

Many systems are optimized around whether the model can act, not whether the product has enough structural resilience once it does. That distinction matters more as multi-agent systems become normal.

Once one agent treats an incorrect assumption as truth, every downstream action becomes harder to unwind. Summaries influence prioritization, which affects scheduling, which changes communication timing, which changes how people interpret intent. The mistake accumulates socially before it accumulates technically. Because each step appears locally reasonable, the overall drift becomes surprisingly difficult to diagnose. Users don’t experience a clear error. They experience a product that has become subtly harder to trust.

Most teams need better language for evaluating this kind of risk. We talk about hallucinations, latency, and model quality, but those concepts don’t capture what’s happening at the interaction layer.

The more useful framing is around premature action and propagation depth.

Premature Action Rate (PAR) measures how often a system initiates meaningful action before explicit user confirmation. Not every premature action is harmful. In some contexts, proactive behaviour is the feature. But high PAR combined with weak correction patterns tends to erode trust over time in ways that don’t surface cleanly in traditional metrics.

Autonomous Error Propagation Depth (AEPD) measures how many downstream actions inherit an incorrect assumption before the user intervenes. This may become one of the defining operational metrics of autonomous products. A bad recommendation is manageable. A recommendation that silently reshapes five connected systems before anyone notices is something else entirely.

Time to Reorientation is the metric most current explainability UX misses. Teams measure how quickly users can undo an action. The harder problem is how quickly users understand what the system believed was happening, not reversibility, but rapid reconstruction of the system’s mental model. Most explainability interfaces focus on justification: here’s why the AI did this. Users often need something more basic: here’s where the system’s understanding diverged from yours. Those are fundamentally different interfaces, and most products haven’t built the second one.

When PAR and AEPD rise faster than Time to Reorientation falls, you’re shipping autonomy faster than you’re governing it.

I also think the industry is underestimating how much this changes the emotional texture of software.

Traditional interfaces waited for instruction. Autonomous systems introduce anticipation, interpretation, and initiative into the interaction itself. Products now carry a social posture, and users read it whether or not teams design for it intentionally.

Some products feel overeager, constantly finishing your sentences before you know what you’re trying to say. Some feel hesitant, hedging every action with so many confirmations that the autonomy becomes theatre. The overeager assistant that rewrites your email before you finish it isn’t just making a UX error, it’s establishing a dynamic that users generalize to every interaction that follows.

Once users start perceiving personality traits in a system’s behaviour, trust stops functioning like a usability problem. It starts functioning like relationship management. Most product teams have mature frameworks for the former and almost nothing for the latter.

The healthiest AI interaction patterns right now are the least theatrical. Small previews instead of giant confirmations. Lightweight interruption points instead of hard stops. Clear reversal paths instead of defensive explanation. Systems that acknowledge uncertainty without constantly announcing it.

The products that feel most trustworthy aren’t the ones making the fewest mistakes. They’re the ones where recovery feels calm, fast, and proportionate to what went wrong. That calibration, matching the weight of the recovery to the weight of the error, is becoming its own design skill. Most teams just haven’t named it yet.

The future of AI UX isn’t about eliminating error. Autonomous systems will always make interpretive mistakes because interpretation is probabilistic. The more important question is whether products can absorb those mistakes without letting them cascade socially, emotionally, or operationally.

That’s what designers are increasingly being asked to solve. Not just interaction design. Not just explainability. Containment, the ability to limit how far a wrong first step can travel before a human catches it and pulls the system back into alignment.

Because the danger isn’t simply that the system fails.

It’s that it keeps going.

Designing the Failure Script

Prompting isn’t magic—it’s briefing

Design Isn’t Strategy. But It Needs One

Measure What Moves Users