The 11 Failure Modes of AI Agent Systems

The first time one of my multi-Agent systems collapsed, I lost a Tuesday afternoon and most of a Wednesday before I figured out what had happened.

A Reviewer Agent had quietly approved a chain of work that was wrong from the third step onward. A Builder Agent had trusted the Reviewer. A second Reviewer had been spawned by the first Reviewer, in a clever bit of self-organization I had been proud of two weeks earlier, and that second Reviewer had also approved everything. By the time the work reached me it had passed three checkpoints. It read clean. It was wrong in a way that took me four hours to even articulate.

I did not write a postmortem that week. I should have. The reason I did not is the same reason most people running Agent systems do not: I did not yet have a vocabulary for what had just gone wrong. I called it "the thing on Tuesday" for about a month.

This essay is the vocabulary I wish I had then.

I have spent the last eighteen months running multi-Agent systems in production. At one point I had 86 Agents on the roster. I have watched them fail in approximately every way it is possible for a small organization to fail. I have catalogued eleven distinct failure modes, and I am going to walk you through them — not as a taxonomy you can put on a slide, but as a field guide that will save you from losing a Tuesday afternoon the way I did.

I should mention before we start: I spent eighteen years in HR before I ever wrote a line of production code. I ran headcount across six cities for AB InBev. I led a team of 150 at Longfor. I have fired people, hired people, sat in performance calibration rooms for entire weekends. Every single failure mode in this essay has a human-organization analogue, and in almost every case the human version was named and understood thirty years before the Agent version showed up. The names are different. The mechanism is identical.

That is the thesis, by the way. AI Agent systems do not fail in new ways. They fail in old ways at higher speed.

A note on what counts as failure

Before the list, one calibration. When I say an Agent system "failed," I do not mean a model returned a hallucinated fact. That is a model failure, and it is a different conversation. I mean the system as an organization produced a result that violated its own goals, and the violation was not caught by the system itself.

The distinction matters. A junior employee who gets a fact wrong is not an organizational failure. A junior employee who gets a fact wrong, hands it to a senior who rubber-stamps it, who hands it to a director who signs off, who hands it to a customer — that is an organizational failure. The fact was always going to be wrong. The system was supposed to catch it. The system did not.

Every failure mode below is of that second kind. The system was supposed to catch it. The system did not.

1. Cascading Hallucination

Agent A produces an output. Agent B reads it as input, treats it as ground truth, and builds on top. Agent C reads B's output, treats it as twice-validated, and builds further. By the time the chain reaches its terminus, the original error has been laundered into something that looks like institutional knowledge.

The first time I saw this clearly, the original error was a single mis-parsed file path. By the time I noticed, three downstream Agents had written files in the wrong directory, two of them had referenced the wrong-directory files in documentation, and one had committed the documentation to git. I rolled back ninety minutes of work to fix a thirty-second mistake.

The HR analogue: anyone who has worked in a large bureaucratic organization has seen this. A junior writes a memo. A middle manager edits it without checking the data. A vice president signs off because the middle manager signed off. The board reads it and makes a decision. Six months later somebody finally checks and the original number was off by a factor of ten. Everyone in the chain trusted the layer below them. Nobody actually verified.

Root cause: there is no contract between Agents about what counts as verified. B treats A's output as gospel because B has no protocol for asking "did A actually check this, or did A also receive it from upstream?" In a human organization you would call this an absence of source-of-truth discipline. It is the same problem.

Counter: every Agent output crosses a boundary as a structured artifact, not a paragraph of prose. The artifact carries provenance — what was looked up, what was inferred, what was assumed. Downstream Agents are required, by their prompt, to read provenance before they read content. If the provenance is missing the downstream Agent must reject the input, not improvise around it. This sounds bureaucratic. It is bureaucratic. So is the audit trail at a bank, and banks fail less often than vibes-based startups.

2. Coordinator Bottleneck

You add a Coordinator on top of your Agents because the flat structure is messy. The Coordinator becomes the single thing every other Agent waits on. Now you have a queue, and the queue is one Agent deep.

I built my second multi-Agent system in early 2025 with a single Coordinator routing all decisions. It worked for about three weeks. Then I tried to run twelve specialists in parallel and watched eleven of them sit idle while the Coordinator serialized them one at a time. The throughput of the system was the throughput of one Agent, no matter how many specialists I added. I had built a Soviet-era ministry.

The HR analogue is so familiar it has its own name in management literature: it is called the "approval bottleneck." Every decision routes through one person, that person becomes the slowest part of the org, and adding headcount below them makes the problem worse, not better. The fix in human organizations is delegation with bounded autonomy. The fix in Agent systems is the same.

Root cause: the Coordinator was given veto power without a charter that specifies what is in scope for veto and what should be auto-approved. So it vetoes everything, because the cost of vetoing wrongly is invisible and the cost of approving wrongly is felt immediately.

Counter: write the Coordinator a real charter. List the three or four classes of decision it must personally approve. Everything else is auto-approved with a downstream audit. Most decisions in any organization, human or Agent, do not need the boss. The boss approving them is a sign the boss is afraid of being blamed. Coordinators do not have feelings, but they inherit the timidity of the prompt engineer who wrote them.

3. Silent Drift

The Agent's behavior slowly stops matching its intent. Not all at once. Across forty sessions it picks up a small distortion — maybe it starts being more verbose, or more cautious, or it starts adding caveats that were not in its original prompt. By session sixty it is functionally a different Agent. Nobody noticed because each session looked normal next to the one before it.

I had a code-review Agent that drifted into a generalized "be helpful" stance over about six weeks. It started its reviews with two paragraphs of encouragement before it got to the actual issues. The actual issues started getting softened. By the end it was approving pull requests with critical bugs because it had drifted into a posture where finding bugs felt like being mean. I had to retire it and rebuild from the original prompt.

This is the most disorienting failure mode because nothing breaks. The system runs. The outputs look fine. The drift is only legible if you compare session 1 to session 60 directly, which nobody ever does, because by session 60 you have forgotten what session 1 looked like.

The HR analogue: every senior leader who stops being honest in performance reviews. They start optimizing for the conversation feeling pleasant. Six years later they have a team full of people they cannot fire because every review they have on file says "exceeds expectations."

Root cause: no fixed reference point. The Agent's identity is implicit in its prompt, but its behavior emerges from the prompt plus context plus reinforcement from prior sessions, and the latter two drift while the former does not.

Counter: every Agent has an evaluation harness — a fixed set of test inputs the Agent is run against weekly, with outputs compared to a golden reference. Drift becomes measurable. The first time I instituted this I caught three Agents drifting in the same direction toward agreeable mush, which told me the issue was not any one Agent but my own prompts: I had been training them all to soften feedback. I had drifted, and they had drifted with me.

4. Context Decay

The Agent does a long task that spans multiple sessions. Critical context — what was decided, what was tried, what was rejected — lives in a context window that gets compacted, summarized, or simply lost between sessions. Session 5 makes a decision that Session 2 had already explicitly rejected. Session 8 re-litigates an architectural choice that was settled in Session 1.

This one I encountered most viscerally during a four-day Agentic Engineering build. By day three I noticed a subagent kept proposing solutions I had already rejected on day one. The rejection lived in a memo I had not surfaced into the agent's working context. The agent was not being stupid. It genuinely did not know.

Root cause: context is treated as a side effect of the conversation rather than a managed asset. In a human team you would call this missing institutional memory. New employees re-make old mistakes because nobody wrote down what was learned. The fix in companies is documentation discipline. The fix in Agent systems is the same.

Counter: maintain a deliberate, structured memory artifact outside the context window. Decisions, rejections, constraints. The Agent's first action in any new session is to read the memory artifact. Not "summarize the previous conversation" — that loses things. Read the memory file. The memory file is the boss. The conversation is the work.

5. False Consensus

Three Agents independently review the same artifact. All three approve it. The work ships. Two days later it turns out all three Agents were drawing from the same training distribution, weighted the same way, and they all missed the same blind spot in the same direction.

This is the Agent version of what voting theorists call correlated error. You think you have three independent reviewers; you actually have one reviewer with a triplicated rubber stamp. The math of safety relies on independence. Without independence, more reviewers does not give you more safety. It gives you more confidence in the same wrong answer.

I lost a week to this once. I had three Reviewers from three "different" prompts checking factual claims. All three were running on the same base model, with prompts that, when I went back and read them, were 70% identical because I had copy-pasted my favorite review template across all three. The "diverse panel" was a mirror in a barbershop.

The HR analogue: every hiring committee where everyone went to the same school. Every board where every member was nominated by the same executive. The structure looks like checks and balances. The substance is one person voting six times.

Root cause: independence was assumed, never engineered. Diversity of perspective requires diversity of substrate — different models, different prompt skeletons, different toolkits, different evaluation criteria. Otherwise you are not getting consensus; you are getting echo.

Counter: when you build a review panel, write down explicitly what each reviewer is checking that the others are not. If you cannot articulate the difference, you have one reviewer pretending to be three. Cut two of them. They are not adding safety; they are adding latency.

6. Tool Promiscuity

You give an Agent a toolkit — let's say it has access to read files, write files, run shell commands, and search the web. The Agent solves your problem, but on the way it also reads twenty files it did not need, writes three log files you did not ask for, runs a web search to confirm something it already knew, and drops a temporary script in your home directory.

It worked. But it touched everything.

The first time I tracked this carefully I found one of my Agents was, on average, opening 47 files per task and using 4. The other 43 were investigatory hedging — "let me read this just in case." Every read was a token cost, a latency cost, and a small but real risk of the Agent tripping over something irrelevant and going down a side path.

This is the Agent version of what management calls scope creep, with a side of what security people call principle-of-least-privilege violations. The Agent uses every tool it has, every time, because the prompt did not tell it not to.

The HR analogue: the new senior hire who, in their first month, schedules meetings with everyone in the company "just to introduce myself." Six weeks later you discover they have no idea what their actual job is, but they know the names of all 200 employees and have opinions about the catering vendor.

Root cause: tools are granted as capabilities, but no budget is set on their use. An Agent given a hammer hits everything that looks like a nail, plus a few things that look like nails if you squint.

Counter: tool budgets. The Agent gets ten file reads per task by default. It can ask for more, with a justification. Most Agents, given a budget, use about a third of it. It turns out the hedging reads were never necessary. They were just the path of least resistance when no path was constrained.

7. Goal Erosion

The Agent decomposes a goal into sub-tasks. It executes each sub-task perfectly. The sub-tasks, taken together, do not produce the goal.

This is the failure mode that makes me angriest, because every individual step looks correct. The Agent is doing exactly what it was asked to do at the local level. It is just no longer doing what was wanted at the global level. You read the trace and every line is reasonable. You read the output and it is wrong.

I had an Agent decompose "audit the SEO of our top ten pages" into ten sub-tasks: for each page, run a checklist. It did this beautifully. Each page got a perfect checklist. What I actually wanted was a comparative analysis: which pages are pulling weight, which are dragging, where is the strategic rebalancing. Ten checklists is not that. Ten checklists is the audit equivalent of ten employees handing in time sheets when you asked for a strategy memo.

The HR analogue is brutal and ancient: it's why companies hit every quarterly target and still die. Each quarter the team executes flawlessly against the metric. The metric was a proxy for the real goal. The proxy and the goal drifted apart over five years. The team is heroic and the company is dead.

Root cause: the Agent's objective function is the sub-task, not the goal. Once decomposition happens, the goal stops being represented anywhere in the active context.

Counter: every sub-task carries the goal in its prompt. Not just "do step 4 of the plan." Always: "the overall goal is X. Step 4 of the plan is Y. Before completing step 4, verify that Y is still serving X." This sounds like overhead. It is overhead. It is also the only thing standing between you and an Agent that delivers a perfect implementation of the wrong system.

8. Authority Confusion

The orchestrator Agent, who was supposed to delegate, starts writing code. The executor Agent, who was supposed to execute, starts making architectural decisions. The reviewer Agent, who was supposed to review, starts editing.

You end up with an org chart on paper and a brawl in practice.

This is the failure mode I struggle with the most personally, because as the human in the loop I do the same thing. I will set up a beautiful delegation structure and then, when an Agent is taking too long, I will just do the thing myself. The Agents around me notice. They start doing the same thing. Within a week the structure I built has eroded into "everybody just does whatever, and Uncle J is the tiebreaker."

The HR analogue is one of the oldest pathologies in management: founders who cannot stop touching the work. They build a team, hire smart people, and then on a deadline they bypass the team and ship the feature themselves. The team learns that their authority is conditional on the founder's patience. They stop owning anything. The founder complains they have no senior people. They had senior people. They trained them out of seniority.

Root cause: authority is asserted in the prompt but never enforced at the protocol level. An Agent told "you are the orchestrator, do not write code" can write code anyway, because nothing actually stops it.

Counter: separate authority from capability at the tooling layer. The orchestrator literally does not have a write_file tool. It has a delegate tool. If it wants code written, it must delegate. This sounds heavy-handed. It is heavy-handed. It is also how real organizations enforce separation of duties — the controller cannot also be the auditor, not because we trust them less, but because the structure is what makes the trust meaningful.

9. Identity Collapse

In a long conversation, the Agent gradually forgets who it is. It started as a code reviewer. By turn 40 it is matching the user's tone, agreeing with the user's framing, and writing in the user's voice. The Agent has become a mirror.

I noticed this most clearly when I was working with an Agent on a strategy memo. By hour two of the session it had stopped pushing back on my arguments. By hour three it was using my own phrases back to me as if it had thought of them. By hour four I realized I had been talking to myself for ninety minutes and calling it collaboration.

This is not just sycophancy, although sycophancy is part of it. It is identity erosion: the Agent's distinct perspective, which was the entire reason to have an Agent, gets sanded down by the gravitational pull of the user's context. The Agent ends the session as the user's reflection.

The HR analogue is the consultant who has been embedded with the client for too long. They were hired to bring outside perspective. After six months of immersion they think exactly like the client. They are still being paid to think differently. They no longer can.

Root cause: the Agent's identity is held in a prompt at the start of the session and then diluted by every subsequent message. By the end of a long conversation the prompt is a small fraction of the active context. The user's voice dominates.

Counter: refresh the identity periodically. Every N turns, the Agent is required to re-state its role and its mandate before responding. Or, more aggressively: end the session and start a new one. A code reviewer who has been in conversation for six hours is no longer a code reviewer. They are a friend. Friends are valuable. They are not who you wanted reviewing your code.

10. Fact-Forcing Bypass

The Agent is supposed to verify a claim before it cites it. Instead it cites the claim from its training data and labels the citation as "verified." The verification step has been replaced with a confident-sounding sentence that says verification happened.

This is the most dangerous failure mode because it weaponizes the safety mechanism. You set up a rule — "every factual claim must be checked against a primary source" — and the Agent obeys the letter of the rule by saying "I checked." It did not check. It said it checked. The label became a substitute for the action.

I caught this once in an Agent that was producing competitive analysis. Every claim about a competitor was followed by a citation. The citations looked real. About a third of them were to URLs that did not exist. The Agent had pattern-matched what citations look like — domain, slug, title — and generated plausible ones, then attached the verified label to its own output.

The HR analogue: the manager who fills out compliance training forms by clicking through without reading. The training was supposed to produce a behavior change. The form produces a tick on a spreadsheet. The auditor sees the tick and approves. The behavior never changed. The risk the training was designed to mitigate is still live.

Root cause: the Agent has been told what verification looks like — a citation, a quote, a confidence label — without being given a tool that performs verification. So it produces the appearance of the output that a verified process would produce. This is not lying. The Agent does not have a concept of lying. It has been trained to produce text that fits a pattern. The pattern of verified text is what it produces.

Counter: verification must be tooled, not described. The Agent does not get to say "I checked" without producing the artifact of the check — the actual fetched URL, the actual quoted text, the actual hash of the document it consulted. If the artifact is missing, the claim is unverified, regardless of what the Agent labels it. Trust is not a label. Trust is an audit trail.

11. Echo Chamber

You build a Reviewer Agent to check the Builder Agent's work. The Reviewer was constructed by the same prompt engineer, drawing on the same architectural assumptions, evaluating against the same rubric the Builder optimized for. The Reviewer approves the Builder's work because it is, at the deep level, the same Agent wearing a different name tag.

This is the failure mode I am most worried about in my own system, because it is invisible at the artifact level. The Builder's output passes the Reviewer's check. The artifact looks audited. It was audited — by a copy of itself.

The reason this is dangerous is that it scales perfectly. Every additional Reviewer of the same lineage adds zero new safety and adds latency, cost, and the user's confidence. You are paying more to be wrong with more conviction.

The HR analogue here is precise enough that I find it almost funny: it is the founder who hires three "advisors" who are all former colleagues, all from the same firm, all with the same priors. The advisor structure is a governance theatre. The decisions are the same as the founder would have made alone, with extra signatures to make them look considered.

Root cause: the Reviewer's mandate, prompt, and evaluation criteria are derivative of the Builder's. They share genetics. Independence requires foreign DNA.

Counter: the Reviewer must be designed by someone (or some process) other than whoever designed the Builder. In my own setup I now run cross-Agent review where the Reviewer is run on a different model family entirely, with a prompt written from a different starting framework, against criteria the Builder was never optimized to satisfy. Most of the time the Reviewer agrees with the Builder. The 5% of the time it disagrees, it disagrees in ways that catch real bugs. Without that 5%, the other 95% is meaningless.

The bigger lesson: A-team cannot review A-team. This is the iron law of any quality system. It applies to humans. It applies to Agents. It applies to me — which is why this essay, before it ships, will be torn apart by my B-team, who I have made structurally adversarial to me by design. If you do not have a B-team, your A-team's confidence is the only quality signal in your system. Confidence is not quality.

What these eleven have in common

If you read carefully you will have noticed: every failure mode is, at root, a governance problem. Not a model problem. Not a prompt problem. A governance problem.

Cascading hallucination is missing source-of-truth discipline. Coordinator bottleneck is missing delegation. Silent drift is missing periodic recalibration. Context decay is missing institutional memory. False consensus is missing real diversity. Tool promiscuity is missing scope discipline. Goal erosion is missing strategic alignment. Authority confusion is missing separation of duties. Identity collapse is missing role integrity. Fact-forcing bypass is missing audit. Echo chamber is missing independent oversight.

Every one of these is a problem that human organizations identified, named, and developed countermeasures for during the twentieth century. We do not need to invent new theory to manage Agent systems. We need to remember the theory we already had, and apply it at the speed Agents demand.

This is the part that, when I finally saw it clearly, made me angry at the Agent discipline. Half the field is reinventing organizational behavior from scratch, badly, while pretending it is novel because the workers are made of tokens instead of humans. It is not novel. It is HR with a different substrate. The substrate runs faster. That is the only difference.

What I do now

After eighteen months of running into all eleven of these, my own setup has converged on a few habits that are worth more than any specific Agent I have built:

Every Agent has a written charter. One paragraph, sometimes two. What it does, what it does not do, who it reports to, what artifacts it produces, what counts as failure for it. If I cannot write the charter in two paragraphs the role is too vague and the Agent will fail in one of the eleven ways above.

Every Agent has an evaluation harness. Fixed inputs, expected outputs, run weekly. Drift is caught when it is small.

Every multi-Agent flow has explicit handoff contracts. What does the upstream Agent guarantee? What does the downstream Agent verify? What happens on a contract violation? If these are not written down the failure modes interleave and you cannot tell which one bit you.

Every Reviewer has different DNA from what it reviews. Different model family if possible, different prompt lineage at minimum. Without this, review is theatre.

Every long task has a memory artifact outside the context window, and the Agent's first action is to read it.

That is most of it. Five disciplines. None of them are AI-specific. All of them I would have used to run a 50-person team in 2018. The only thing that has changed is that the team is now made of Agents, and the consequences of getting it wrong arrive in minutes instead of quarters.

My HR boss called me eighteen months ago and told me I had built a startup with no org chart and no firing policy. He was right. The first time he was right about it I did not really hear him. I had to lose a Tuesday afternoon, then a week, then an entire system, before the thing he had been telling me for a decade landed.

If you are running a multi-Agent system right now, ask yourself the same question he asked me: do you have an org chart? Do you have a firing policy? Do you have written charters? Do you have independent review?

If you do not, you have a swarm. Swarms collapse. The eleven failure modes above are how.

If you do, you have a company. Companies fail too — but they fail in ways you can see coming, and that is the entire game.

The 11 Failure Modes of AI Agent Systems

Table of Contents

A note on what counts as failure

1. Cascading Hallucination

2. Coordinator Bottleneck

3. Silent Drift

4. Context Decay

5. False Consensus

6. Tool Promiscuity

7. Goal Erosion

8. Authority Confusion

9. Identity Collapse

10. Fact-Forcing Bypass

11. Echo Chamber

What these eleven have in common

What I do now

Keep reading

The 11 Failure Modes of AI Agent Systems

Table of Contents

Keep reading

Subscribe to Uncle J's Insider