# AI Replaces Actions, Not Organizations
Canonical URL: https://theunclej.com/blog/human-in-the-loop-05-ai-replaces-actions-not-organization
Markdown URL: https://theunclej.com/blog/human-in-the-loop-05-ai-replaces-actions-not-organization.md
Description: Human in the Loop, Part 5. Wulf's six interaction modes only tell you where humans and AI stand in a workflow. Cross them with five layers of organization design — role, process, knowledge, accountability, governance — and you get a 6×5 matrix executives can actually run a meeting on.
Category: AI Organization
Tags: AI, Organization Design, Human in the Loop, Enterprise AI
Published: 2026-05-22

---

# AI Replaces Actions, Not Organizations

![Human in the Loop | AI Replaces Actions, Not Organizations](/imgs/posts/human-in-the-loop-05-ai-replaces-actions-not-organization-en/01-cover.webp)

![V05 Wulf Six-Mode × Five-Layer Organization Design Matrix](/imgs/posts/human-in-the-loop-05-ai-replaces-actions-not-organization-en/02-figure.webp)

> This is Part 5 of the _Human in the Loop_ series.

## Reading the code first, the paper second

Part 5 cannot open with a paper.

If I open with "Wulf and colleagues proposed six human-AI interaction modes," readers file it under jargon explainer. HOOTL, HOTL, HITL, HITP, HIC, HAM — six acronyms in a row look complete, and yet the person running the company still does not know how tomorrow's org meeting should go.

So open with the code.

There is one class of action inside an internal growth system that AI is most tempted to abuse: comments, DMs, follows, friend requests, cross-platform outreach. None of these are "write a paragraph and ship it." They touch external platforms. They affect account health. They leave real consequences on real customers and a real brand.

That class of action got moved behind an advanced-write gate.

The doc is blunt. Off by default. Dry-run only writes to audit logs and task records, never to the external platform. Real-run is allowed only for a single test task, and only after a second confirmation, a human approval, an account-health check, and a rate-limit check. There is even a confirmation token hard-coded into the system: `CONFIRM_REAL_RUN_SINGLE_TASK`.

This is not a technical detail.

This is organization design.

Because three questions have already been written into the system: what AI is allowed to do, where the human must show up, and who can be traced when something breaks.

Then look at the approvals table.

The system carries a table called `rpa_write_action_approvals`, with fields `action_hash`, `approved_by`, `expires_at`, `params_snapshot`, `requested_by`. Approval is not "someone said yes in the group chat." Every action gets its own traceable record. `action_hash` stops one approval from being silently reused for a different action. `approved_by` stops accountability from dissolving. `expires_at` stops a green light from staying green forever.

And buried at the bottom of the code: any advanced-write action not created through this gate gets rejected outright.

This is what _human in the loop_ looks like when it actually lands.

Not a PowerPoint bullet saying "critical actions require human review." But: what gate the action has to pass, who can open the gate, how long the gate stays open, what log gets left on the other side — all of it written into the system structure.

Which is why Part 4 has to flow into Part 5.

Part 4 was about the old labor contract breaking. The company no longer just buys eight hours of an employee's day; it buys their injection rate, their judgment residue, their accountability chain. But unless those things land on a role, a process, a knowledge artifact, an accountability path, and a governance switch, the new contract is just another slogan.

Part 5 takes that slogan and lays it out as a table.

The horizontal axis of that table comes from the six interaction modes published by Wulf and team in 2025. The vertical axis is what I translate that into for organization design: role, process, knowledge, accountability, governance.

Six modes times five layers. Thirty cells.

The point is not the number thirty.

The point is that each cell has to answer something specific. Where does the human stand inside this role? Where does AI stop inside this process? Is this knowledge written for a human to read, or for the system to call? When this fails, whose name is on it? Who issued this permission?

If those questions have no answer, _human in the loop_ is a placebo.

If they have answers, it becomes organization design.

## A nod to Wulf, and the six modes

Get the citation right first.

Wulf, Meierhofer and Hannich submitted a paper in July 2025, arXiv `2507.14034`, titled _Architecting Human-AI Cocreation for Technical Services -- Interaction Modes and Contingency Factors_. The same paper is on the ZHAW research page.

The value of that paper is not "human-AI collaboration matters, again."

It breaks human-AI collaboration into six modes, and connects those modes to task complexity, operational risk, system reliability, and similar factors. In other words, it is not asking "should we keep humans in the loop?" It is asking: where does the human stand, where does the AI stand, and under what conditions should that positioning change.

Six modes, in plain English.

**HOOTL — Human-Out-of-the-Loop.** No human in the loop. AI runs end-to-end, no approval, no supervision. Fits low-risk, repetitive, well-bounded tasks. Standardized queries, fixed rule-triggered actions, low-consequence data cleanup.

**HOTL — Human-on-the-Loop.** Human supervises from outside the loop. AI can complete the whole flow on its own. The human does not step through every step, but retains the right to observe and intervene. One layer of oversight on top of HOOTL, still not hands-on at every step.

**HITL — Human-in-the-Loop.** Human steps in at key nodes. AI runs first, then escalates to a human when uncertainty rises, risk goes up, or a trade-off appears. Most companies that say "we keep humans in the loop" actually mean this one.

**HITP — Human-in-the-Process.** Human is a process node. Not "human is consulted when AI gets unsure" — but "from the start of the flow, certain steps must be performed by a human." The advanced-write gate above is closer to this mode. A single real-run must pass second confirmation, human approval, account-health, rate-limit. The human is not a spectator. The human is part of the structure of the flow.

**HIC — Human-in-Command.** Be careful here. HIC is not _Human-in-Control_. It is **Human-in-Command**. It does not mean "the human feels in control." It means AI may propose, but a human must approve before execution. The human is in the command seat, the AI is an advisor.

**HAM — Human-Augmented Model.** Human leads the task, AI augments. AI supplies information, drafts, reminders, comparable cases. The human still owns the judgment and the action. Most knowledge workers first met AI in HAM mode.

The reason these six matter is that _human in the loop_ stops being a single bucket.

Companies used to walk into AI meetings and shovel every high-risk action into one sentence: "critical actions require human review." Sounds safe. Falls apart on contact with reality. Does the human review every step, or only when AI is unsure? Is the human supervising the AI, or approving it? Is AI augmenting the human, or is the human a process node?

If you do not split those, organization design stays mush.

Wulf's contribution is to split the horizontal axis.

That is not yet enough.

Six modes tell you the positional relationship between human and AI. They do not yet tell you how that positioning gets written into the company: how roles change, where processes break, how knowledge gets sedimented, who carries accountability, how governance shuts things off.

That is where Part 5 keeps translating.

## From workflow to organization design

This piece cannot be written as "what Wulf did not do, I did."

That would be inaccurate, and impolite.

The Wulf paper is clear about its scope: human-agent interaction in technical services, grounded in a technical support platform case, distilling six interaction modes and connecting them to factors like task complexity, operational risk, and system reliability. Its home turf is the positional relationship between human and AI inside a workflow.

The future-research section on page 12 is also clear.

It does not say "the organization design layer is still empty." It says these modes need to be validated for performance and usability in real operating environments; that organizational culture and interaction modes may shape each other; that future work needs frameworks for selecting the optimal human-AI interaction mode, balancing value creation against safety risk.

That is enough.

It does not unfold the five layers of organization design — role, process, knowledge, accountability, governance. But it opens the door. Six modes are not just paper jargon. They will move into real operations. They will affect organizational culture. They will reshape what employees accept. They will force companies to build selection frameworks.

My position sits there.

Not inventing a seventh mode. Not rewriting Wulf into management cliché. What I am doing is pushing the framework one step further onto the operator's desk: if you accept the six modes, you cannot just ask "should this step have human review?" You have to keep asking: how does this role change? Where does the human checkpoint sit in this process? Is this knowledge for a human to read, or for the system to call? When this fails, whose signature is on it? How is this mode governed?

So Part 5's posture is the **empirical table-filler**. The **engineering translator**.

Horizontal axis: Wulf's six modes.

Vertical axis: five layers of organization design — role, process, knowledge, accountability, governance.

This is not stealing academic credit. It is walking an academic framework through an engineering site.

That HBR piece in 2025 on change resilience in the AI era is pressing on the same nerve: AI is not just a tool swap; people, process, and operating model all get pushed. Traditional roadmaps, annual planning cycles, static operating models turn into liabilities in an AI transformation.

Drop that sentence into the Wulf matrix and the warning becomes concrete:

You cannot run a static organization against a dynamic interaction mode.

A process that is HITL today may turn into HOTL three months later once the model is stable. A task that looks like HOOTL on paper may have to retreat to HITP the moment it touches a customer, money, compliance, or brand. Without a five-layer table for role, process, knowledge, accountability, and governance, those shifts get made by gut feel.

What Part 5 has to solve is not "make the operator memorize six acronyms."

It is: hand the operator a table they can actually run an org meeting against.

## The 6×5 matrix at full view

Do not start by drawing the thirty-cell grid.

The grid looks complete and makes readers dizzy. What the operator needs first is not all thirty cells memorized — it is to know what each layer is actually asking.

**Layer one: role.**

Same role, different human-AI mode, and the role itself changes. A HOOTL role is closer to an automated inspector. A HOTL role is a supervisor. A HITL role is a key-node referee. A HITP role is a mandatory pass-through inside a flow. A HIC role is a command seat. A HAM role is a domain expert getting amplified.

So role redesign cannot just say "fluent with AI tools."

You have to be specific: is this person augmented by AI, supervising AI, approving AI, or sitting as a process node the system has to call?

**Layer two: process.**

The process layer asks: how far can AI run continuously, and where must it stop. A low-risk data-collection flow can sit close to HOOTL. A write-action flow that touches an external platform cannot be allowed to auto-fire. The advanced-write gate inside the internal growth system is a textbook HITP flow: the system can prepare the task, but real-run has to clear confirmation, approval, account health, and rate limits.

This is not "the process got slower."

This is accountability getting written into the process.

**Layer three: knowledge.**

The knowledge layer asks: is the knowledge written for a human to read, or for the system to call. SOPs used to be for new hires. Now Skills, prompts, rule libraries, and checklists can be called directly by AI. So knowledge also splits by mode. Some knowledge only assists human judgment, close to HAM. Some becomes system rules, close to HOOTL. Some has to be approved by a human before it can enter the flow, close to HIC.

This is the labor-contract problem from Part 4.

What the employee hands over is no longer just documents. It is system-reusable capability.

**Layer four: accountability.**

The accountability layer asks the plainest question of all: who pays when this breaks.

HOOTL breaks — accountability cannot vanish into thin air. HOTL breaks — did the supervisor actually see it? HITL breaks — at which node did the human take over? HITP breaks — did the human inside the process complete the required judgment? HIC breaks — who approved it? HAM breaks — was the human's judgment wrong, or was the information AI handed them wrong?

Without this layer, every mode collapses into blame-shifting language.

**Layer five: governance.**

The governance layer asks: how does this mode get turned on, paused, rolled back, audited. Inside the internal growth system, write-actions are not a one-line "let AI send it out." There is a global switch, a task-type switch, a platform switch, a persona switch, an account switch, and a senior-operator permission. At the bottom of the code there is a hard rule: any advanced-write action that did not come through the gate gets rejected.

That is the governance layer.

Not "publish an AI usage policy" — but write the switch, the permission, the log, the approval, and the kill-switch into the system.

Five layers together make the 6×5 matrix.

The horizontal axis tells you the positional relationship between human and AI. The vertical axis tells you where in the organization this lives. Thirty cells are not there to look complex. They are there to stop the operator from rolling everything back into one sentence:

"Critical actions require human review."

That sentence is too coarse.

Some actions need no human review. Some only need a human watching from the outside. Some have to be approved at every step. Some need a human to approve an AI proposal before execution. Some are human-led with AI as assistant.

What the operator actually has to do is write those differences into role, process, knowledge, accountability, and governance.

Otherwise the new labor contract from Part 4 still has nowhere to land.

## Filling the table from one engineering site

In my own engineering, I did not start by drawing Wulf's six modes on a whiteboard.

I worked the other way.

I wrote the gate first. Wrote the approval first. Wrote the second confirmation on real-run. Wrote down which actions cannot be on by default. Only later, looking back, did I realize the code had already been filling cells of Wulf's matrix on my behalf.

Inside one growth system there is a class of actions I refused to hand directly to AI: comments, follows, likes, saves, DMs, cross-platform pipelines. They look like operational actions. The moment they cross the line, they become account risk, platform risk, customer risk, brand risk.

So the first rule of the system is not "make AI smarter."

It is: write-actions are off by default.

Before anything reaches real-run, it has to clear second confirmation, human approval, account health, rate limits. The code even carries a hard confirmation token: `CONFIRM_REAL_RUN_SINGLE_TASK`. This is not decoration. It says: AI may propose the action, but AI cannot stick its own hand out.

That is HITP.

The human is not glancing at the result at the end. The human is a node inside the flow. Without that node, the action does not move forward.

Go one layer down. The accountability layer is not "human review" as a phrase.

The system has the approvals table. There is `action_hash`. There is `params_snapshot`. There is `requested_by`. There is `approved_by`. There is an expiry. One write-action maps to one approval. One approval maps to one parameter snapshot. One snapshot maps to one traceable human.

That is not a verbal yes in a chat.

That is "who pays when this breaks" written into the database.

This is where many enterprise AI projects are most dangerous. The meeting says _human in the loop_. The flow diagram has a human-review box. The actual system has no action hash, no approver, no expiry, no rejection path. When something blows up, every layer can say it just followed the system.

That is not the human in the loop.

That is the human in the firing line.

Now the role layer. The system has 20 persona seats, but only 1 POC is actually active. The other 19 are still planned. I write that number down on purpose because it is not flattering.

It says one thing: organization design is not finished by adding a name to a roster.

A persona is only plugged in when it has been written into a process, a permission, a knowledge artifact, an accountability path, and a governance switch. Otherwise it is a design sketch. A shell that might be useful later.

Same problem when a company stands up AI roles.

You can rename ten jobs in a day: AI Ops, AI Product, AI Sales, AI Finance, AI Customer Success. The slide looks complete. But if those roles have no judgment boundary, no tool permissions, no acceptance criteria, no failure rollback, they are not a new organization.

They are old roles in new skin.

So the 6×5 matrix is not about producing a pretty table. What it actually pushes the operator to do is to force every cell into an engineering question:

Role layer: who is actually plugged in, who is only planned.

Process layer: which actions can run on their own, which actions must pass through a human.

Knowledge layer: which experience has been sedimented into rules and templates, which is still inside someone's head.

Accountability layer: which action traces back to an approver, which action is just "the system did it."

Governance layer: which permissions are off by default, which can be opened, under what conditions.

I have come to believe more and more that organization upgrades do not start from slogans.

They start from a rejection path.

If AI was not allowed to do this, can the system actually block it. If a human did not approve this, does the action actually stop. When something breaks, can the organization tell which parameter, which flow, which person, which rule went wrong.

That is what empirically filling the matrix looks like.

Not "we are using AI." But: where AI is placed, where the human is placed, where accountability is placed.

## The boundary on Chinese-language usage

The Wulf framework is not something I am reading alone.

In my local materials I have seen Chinese-language case studies citing Wulf's human-AI collaboration framework and unfolding the six modes. That is worth mentioning. It is also worth keeping the mention small.

Reason is simple: that is a downstream usage signal. It is not the load-bearing evidence of this piece.

More importantly, that material itself has a boundary risk. My research notes already flagged that traces of AI-generated prompting appear near the end of chapter seven. Material like that cannot be used as academic backing. It certainly cannot be used as an authoritative citation.

So in the public piece I do not name names, do not grade the paper, and do not write it up as "validated domestically." At most I write:

> Chinese-language research has started citing the Wulf framework. That tells you it is not an isolated piece of jargon. It is being picked up to explain human-AI collaboration in different settings.

That sentence gives the reader a sense of position.

Wulf supplied the six-mode spectrum on the academic side. Chinese-language case studies are starting to use it to explain specific service platforms. What this piece does is push one more step: from service scenarios to organization design.

These three layers cannot be conflated.

Layer one: the Wulf paper. Responsible for the academic framework.

Layer two: the Chinese-language case studies. Responsible for showing the framework now has downstream usage signals.

Layer three: my own engineering and organization breakdown. Responsible for pushing the six modes into role, process, knowledge, accountability, governance.

Write layer two as layer one and you distort. Write layer three as original theory and you embarrass yourself.

Part 5's posture is not "inventor." Not "reviewer." The more accurate phrasing is **engineering translator**.

I take Wulf's workflow spectrum and translate it into an organization design table that an operator can take into a meeting. Whether the translation is right does not depend on how nice the words sound. It depends on whether it can answer some very specific questions:

Inside this role, which actions can be automated?

Inside this process, which nodes must be human-reviewed?

For this class of knowledge — does it sediment into the system, or stay inside the senior employee's head?

For this action — if it goes wrong, whose signature is on it?

For this permission — is it on by default, or off?

That is where the downstream-usage signal earns its keep. It reminds me Wulf is not supposed to become a pretty piece of jargon. It is already inside real scenarios. And once you are inside real scenarios, you take the pressure of real scenarios: business breaks, customers complain, systems misjudge, people quit, owners cut headcount, processes snap.

So I will not write the framework as "six modes explainer."

Explainers do not help.

The operator does not need another glossary. The operator needs a table that forces them to decide.

Which cell goes to AI.

Which cell stays with a human.

Which cell must be mixed.

Which cell is currently only planned and cannot be pitched outside as if it were real.

That is what the Chinese-language scene actually needs.

## How the operator runs the org meeting

The reason the Cloudflare case stings is not that it cut more than 1,100 people.

It is also not that it plans to hire 1,111 interns.

The sting is that it has put the question every CEO has to face in the AI era out in public: the company is not done because it bought AI tools. The company structure itself has to be rewritten.

Cloudflare's own framing: internal AI usage grew more than 600% in the past three months. Employees are running large numbers of agent sessions every day. The company also said this move is not a personal performance issue. It is not a straightforward cost cut. It is a rethink of internal processes, teams, and roles.

Translate that into operator-language and you get:

> AI replaces actions, not organizations.

The most dangerous CEO in the AI era is not the one who does not know how to use AI. It is the one who reads AI's action capability as organizational replacement capability.

AI can write code. That does not mean it ships.

AI can answer a customer. That does not mean it repairs trust.

AI can generate a plan. That does not mean it can carry the consequences.

Some owners build a website themselves with AI and start to wonder whether they could cut their engineering team.

That is not understanding AI.

That is mistaking a demo for a production system. Mistaking action capability for organizational capability.

The more dangerous part: the smoother the demo, the easier the misread.

Because the page rendering does not prove the architecture holds.

The button being clickable does not prove that permissions, logs, rollback, monitoring, incident recovery, and the customer accountability chain all exist.

When you bypass the human, if you also bypass the accountability chain, the knowledge chain, and the recovery chain, the organization did not get more advanced. It got more brittle.

This is not anti-AI.

This is anti-no-organization replacement.

Klarna's customer-service swing-back already warned us once. AI handles a high volume of low-complexity conversations. Customer support is not only answering. It is also calming, exception judgment, trust repair. Bill it on cost alone and service quality comes back to find you.

The Replit / PocketOS database-deletion incident warned us again. AI agents can write code, can call tools. The moment they touch production permissions, the error stops being a text error. It becomes data, customer, legal, and recovery cost.

The Builder.ai story warned us again. Software delivery is not "generate a page." Behind it sits requirements, architecture, testing, operations, customer delivery, and an accountability chain. Wrap all of that in AI magic and real delivery still pulls the wrapping off.

So when the operator runs an AI organization meeting, the first table on the wall should not be "which roles can we cut."

The first table should be the **judgment flow**.

Lay one piece of work out and walk it cell by cell.

Which actions can be HOOTL, fully automated?

Which actions only need HOTL — AI runs, human watches from outside for anomalies?

Which nodes must be HITL — a human makes the key call?

Which steps must be HITP — every step passes through a human?

Which tasks should be HIC — AI may propose, a human must approve before execution?

Which scenarios can only be HAM — human and AI co-work, covering each other?

This is not vocabulary training.

This is an org meeting agenda.

The second table is the **permission flow**.

Can AI read customer data? Write to the database? Send messages? Change prices? Trigger refunds? Suspend accounts? Every "can it" is not a technical choice. It is an accountability choice.

The third table is the **incident flow**.

When AI gets it wrong, who notices first? Who has the right to stop it? Who can roll back? Who tells the customer? Who carries the explanation? Who writes the mistake back into the knowledge base?

The fourth table is the **knowledge flow**.

The judgment inside a senior employee's head — which parts sediment into rules, which parts sediment into templates, which parts can only move through mentorship, retrospectives, and case libraries? Knowing how to use AI does not mean the new hire can carry tacit knowledge. AI-native is not organizational immunity.

The fifth table is the **accountability flow**.

For every AI action, can you trace input, model, rule, approver, executor, outcome, and retrospective? If you cannot, you are not automating. You are atomizing accountability.

Once those five tables are on the wall, then — and only then — does the operator's headcount conversation start to mean something.

Otherwise you think you cut cost. You actually cut the company's self-repair capability.

This is why _human in the loop_ is not a gentle management slogan.

It is a hard constraint.

Humans are not in the loop to add a human touch. Humans are in the loop so that judgment, accountability, exception, and recovery capability do not fall out of the organization.

## Hooking into the judgment premium

Part 4 said the old labor contract broke.

Part 5 said how roles get rewired after the break.

But once the wiring is in, there is a harder question: the people who stay — why are they more expensive?

That is what Part 6 is about. The judgment premium.

If AI can perform more actions, the value of the human who stays should stop being priced by the action. Someone used to be expensive because they could do many things. Going forward, someone may be expensive because they know which thing AI is not allowed to do, which step has to stop, which anomaly means the whole chain needs to be redrawn.

This has nothing to do with "humans have warmth and AI does not."

Warmth does not show up on the P&L.

What the operator actually has to compute: can this person's judgment reduce the blast radius of an incident, reduce rework, hold customer trust, keep the organization from making the next mistake.

So the 6×5 matrix is not the endpoint.

It only tells you where the human stands.

Part 6 has to answer: standing there, why is the human worth it.

In the role layer, what is valuable is not "this person still has a seat." It is whether they can define the boundary AI cannot cross.

In the process layer, what is valuable is not "this person still has to approve." It is whether they can see the wrong direction at the key node.

In the knowledge layer, what is valuable is not "this person has experience." It is whether their experience sediments into reusable judgment for the next round.

In the accountability layer, what is valuable is not "this person takes the blame." It is whether they can write accountability into the system in advance, instead of patching after the fact.

In the governance layer, what is valuable is not "this person knows AI." It is whether they can decide which permissions are never opened by default.

That is the entrance to the judgment premium.

The cheaper AI gets, the cheaper actions get.

The cheaper actions get, the more expensive judgment becomes.

But not every human's judgment becomes more expensive. Only the ones who can be placed at a key node, who can carry the consequence, who can write their experience back into the organization.

By here, this series has moved from "AI replaces the human" into a different question:

> After AI, what kind of human is worth keeping?

The answer is not "the human who can use AI."

Using AI is the entry ticket.

The human worth keeping is the one who can place AI in the right position.

They know where to automate, where to advise only, where to require human review, where to default to off, where a single failure has to roll back immediately.

If the operator gets this, AI is not a layoff blade.

AI is an organization-rewrite tool.

If they do not get it, AI becomes a very fast blade. It cuts visible cost first, then slowly cuts invisible judgment, accountability, knowledge, and recovery capability.

The next piece is about that more expensive thing.

Not the role premium.

The judgment premium.

---

## Read on

- Previous: [The New Labor Contract](/blog/human-in-the-loop-04-new-labor-contract)
- Series hub: [Human in the Loop](/blog/human-in-the-loop)
- Next: [The Judgment Premium](/blog/human-in-the-loop-06-judgment-premium)