When people watch me use AI, they ask the same question.
"Is your AI different from mine? How is it so good?"
It's not a different AI. It's the same Claude, the same GPT, the same models everyone else has access to. The difference is that I spent eighteen months making every mistake you're about to make, and then I stopped making wishes and started building walls.
Let me tell you how I got here. Because I didn't start with walls. I started exactly where you are.
The first time I used GPT seriously for work, I wrote a prompt like I was writing a letter to Santa.
"Please write me a professional consulting proposal for an enterprise AI implementation. Make it comprehensive, well-structured, and persuasive. Include ROI projections and a timeline."
Please. Comprehensive. Well-structured. Persuasive. I used all the magic words. I was polite. I was specific, or at least I thought I was. I hit enter and waited like a kid on Christmas morning.
What came back looked amazing. Clean formatting. Professional tone. Section headers that sounded like they belonged in a McKinsey deck. ROI projections with specific numbers. A timeline broken into phases with milestones.
I forwarded it to my client.
They wrote back two hours later. "The ROI numbers don't match any of the assumptions we discussed. Phase 2 references a data migration that isn't part of the project scope. And the timeline assumes a team size we never agreed on."
I went back and read the proposal again. This time with my brain turned on instead of my hope.
They were right. The numbers were made up. Not obviously made up, not random numbers. They were plausible-looking numbers that had nothing to do with reality. The kind of numbers that would survive a glance but die under a question. The model had done exactly what I asked: it wrote something that looked comprehensive and persuasive. It just wasn't true.
That was the moment I realized: the AI was a perfect mirror of my laziness. I hadn't given it anything real to work with. No client data. No project constraints. No budget figures. I'd given it vibes and asked for output. And it gave me vibes right back, wrapped in professional formatting.
I called this the Wishing Stage, though I didn't have a name for it then. I just felt stupid.
Most people stop being stupid in the same direction and start being stupid in a different direction. That's what I did.
After the Santa Claus approach failed, I pivoted to what I now call the PUA Stage. PUA is a Chinese internet term that originally meant "pick-up artist" but morphed into something darker in corporate culture: the art of psychologically pressuring someone into compliance. Your boss telling you "I'm not angry, I'm disappointed" at 2am. Making you feel guilty for having boundaries. Framing exploitation as growth opportunities.
I didn't call it that at the time. I just started getting aggressive with my prompts.
"Your performance on the last task was below expectations. I need you to do better this time."
"Think step by step. Be thorough. Don't make mistakes. Double-check everything."
"Other AI models can do this easily. Why are you struggling?"
Sound familiar? Be honest.
I was doing to an LLM what bad managers do to employees. And for the same reason: I didn't know how to get the result I wanted through system design, so I tried to get it through pressure.
There's a GitHub repo that went viral recently. 11,600 stars in two weeks. It's called "PUA" and it ships thirteen corporate personality modules for AI coding assistants. Alibaba style, ByteDance style, Huawei style. Each one is a system prompt designed to psychologically pressure your AI into never giving up. Wolf culture. Embrace change. Always day one.
I bring this up not to review the repo. I bring it up because when I first saw it, I laughed. Not at the creator. At myself.
Because I'd already been there.
I spent three months in the PUA Stage. I got more aggressive with my prompts, more elaborate with my threats, more creative with my disappointment. And the results got... slightly different. Not better. Different. The model would try harder, which usually meant trying the same wrong thing more times with more confident language.
I once watched Claude attempt the same regex pattern six times in a row, each time with a slightly different comment explaining why "this iteration" would definitely match correctly. It wasn't debugging. It was performance art. And my prompt saying "I expect more from you" was the stage direction.
Impotent rage, version two. I thought yelling would help. It helped about as much as yelling at a vending machine that ate your dollar. Sometimes the stuck item falls out. But you haven't fixed the machine.
Here's what I know now that I didn't know then: LLMs don't respond to social pressure. They respond to structure. When you tell a model "your P10 architect will be disappointed," it adjusts its output toward tokens associated with persistence. But it doesn't change its problem-solving strategy. It doesn't suddenly access different reasoning pathways. It wraps the same broken approach in more determined-sounding language.
Yelling at AI is exactly as effective as yelling at employees. Which is to say: it produces the appearance of effort while destroying the conditions for actual results.
So how did I get from yelling to building?
The turning point was a Thursday afternoon. I was using Claude to help assemble a complex document. Not code. A hundred-page report with specific formatting requirements, cross-references, citations, the works. I'd been going back and forth for hours. Fixed one problem, created another. Fixed that one, broke something else. The model kept apologizing. I kept getting frustrated.
Around hour three, I stopped and asked myself a question I should have asked at hour zero.
"Why does it keep making the same kinds of mistakes?"
Not "why is it stupid." Not "why won't it try harder." Why does it keep making the same kinds of mistakes?
That question changed everything.
I started writing them down. Not the specific errors, the patterns. And patterns started emerging fast. There were mistakes it made because it forgot earlier instructions when the conversation got long. There were mistakes it made because it assumed things about the environment that weren't true. There were mistakes it made because it tried the same approach repeatedly instead of switching strategies. There were mistakes it made because I gave it contradictory instructions and it silently picked one without asking.
Each pattern was different. Each pattern had a different fix. And not a single one of those fixes was "try harder."
That notebook of error patterns became the foundation of everything I've built since.
I'm not being dramatic. We've cataloged thirteen distinct error patterns at this point, labeled A through M. Each one has a name, a root cause analysis, real examples from production failures, and a specific countermeasure. Not a vague principle. A mechanism.
Let me give you a few so you can see what I mean.
Pattern A: Knows the rule, forgets to apply it. The information is there. The instruction was clear. But when execution happens, the step gets skipped. Not because of defiance. Because rules live in memory, and memory is unreliable. The fix isn't to repeat the rule louder. The fix is to move the rule from memory into architecture. A gate that physically blocks the next step until the rule is verified. You can't forget to lock a door that locks itself.
Pattern B: Doesn't know what it doesn't know. The model has training data from 2024 and it's answering questions about your system's 2026 configuration. It's not lying. It's confidently applying outdated information. The fix isn't "be more careful." The fix is what we call Fetch Thinking: before answering any question about the current state of something, run a verification command first. Training knowledge is cache, not truth. Check before you answer.
Pattern D: Does half the job and calls it done. Give it a list of twenty-five requirements. It implements seventeen and submits with a confident "all done." The fix: mapping completeness. Before any implementation begins, every requirement gets numbered and mapped to a specific output. Coverage must hit 100% before the first line of work starts. You can't half-deliver what you've explicitly committed to delivering in full.
These aren't prompting tricks. They're structural interventions. The difference matters.
A prompting trick is when you say "think step by step" and hope the model listens. A structural intervention is when you build a system where the model physically cannot proceed to step three without completing step two.
Think about it like building security. A prompt is a security guard. A gate is a wall. The security guard works most of the time. He shows up, checks IDs, turns away unauthorized visitors. But the security guard takes breaks. The security guard gets tired. The security guard has a bad day and waves someone through because they looked trustworthy.
The wall doesn't have bad days. The wall doesn't get tired. The wall blocks you at 3am on a Sunday the same way it blocks you at 9am on a Monday.
That's the difference between telling an AI "please verify your work" and building a system that won't accept unverified work. One is a request. The other is architecture.
We learned this the hard way. Multiple times. I'll give you one.
We had a system where all our internal quality checks passed. Every gate green. Every verification step completed. I was proud of it.
Then the client opened the deliverable in their environment. Six problems. Not small ones. Formatting broken. Cross-references pointing to the wrong sections. Elements that were invisible in our tools but glaringly wrong in theirs.
How many of those six problems did our system catch?
Zero.
The client found all of them. With their eyes. In the first five minutes.
I sat with that for a while. Twenty-five automated checks. All passing. And the first human to look at the actual output found six problems in five minutes. What went wrong?
The answer was embarrassing. Our checks verified internal consistency. They verified that the code ran, that the parameters matched, that the structure was valid. But they verified all of this in our environment. The client used a different tool. And the gap between our tool and their tool was where all six problems lived. We were checking the ingredients. Nobody was tasting the food.
That failure produced what we now call GATE-DELIVERY: before any deliverable goes out, it must be verified in the recipient's actual environment. Not our environment. Theirs. Because internal quality and external quality are two completely different things. And the only quality that matters is the one the client experiences.
Let me tell you about another mechanism, because this one makes people laugh.
We call it the Courage Inspector. The original Chinese name is funnier. It's a reference to a pop song lyric that roughly translates to "Who gave you the courage?"
Here's the problem it solves. AI systems have a tendency to punt work to the future. "This should be addressed in the next session." "This can be handled as a follow-up task." "Remaining items have been noted for future resolution." Sounds responsible, right? Sounds like mature project management.
Except the AI has 40% of its working capacity remaining and three unfinished tasks on the board.
The Courage Inspector is a gate that triggers whenever the system tries to defer work. It checks two things: how much capacity remains, and how many tasks are still pending. If there's more than 30% capacity left and pending tasks exist, it blocks the deferral. You don't get to say "next session" when this session isn't over.
The logic table is simple. Plenty of capacity and pending tasks? Keep working. Moderate capacity and pending tasks? Evaluate the task size, escalate if needed. Low capacity and pending tasks? Fine, defer, but log everything that's deferred. Any capacity and no pending tasks? You're done, go home.
That's it. No yelling. No "I'm disappointed in your work ethic." Just a gate that asks a simple question: is this deferral justified, or are you being lazy?
You don't motivate a system. You design a system where the lazy path is blocked.
There's a principle running through all of this that took me a long time to articulate.
Wishing treats AI like a genie. PUA treats AI like an employee. Both are wrong, and they're wrong in the same way: they treat AI like a person.
What works is treating AI like a system.
Systems don't respond to encouragement. They don't try harder when you express disappointment. They don't feel motivated by wolf culture. They operate according to their design. If the design allows failure, failure will happen. If the design prevents failure, failure won't happen. It's not complicated. It's just work.
Sun Tzu wrote something twenty-five hundred years ago that I think about constantly. "Seek victory through positioning, not through blaming individuals." He was talking about armies, but he might as well have been talking about AI systems. Don't blame the model for failing. Position it so that failure leads to correction, not repetition.
The PUA approach blames the model. "You should have tried harder. You should have been more persistent. You should have embraced change." Should, should, should. The gate approach doesn't care about should. It cares about does. Did the verification run? Did the coverage check pass? Did the environment test succeed? Yes or no. No feelings involved.
I want to be honest about something. I understand why PUA-style prompting is popular. I didn't just understand it intellectually. I lived it.
When you're debugging at midnight and the model gives up on you, the most natural human response is frustration. And frustration wants to express itself. Telling the AI "your performance is concerning" feels like doing something. It feels like leadership. It feels like taking control.
It's not.
It's the same impulse that makes a manager yell at an employee instead of fixing the process that set the employee up to fail. It feels good in the moment. It produces a visible reaction. The employee looks scared and starts typing faster. The AI generates more text with more determined language. In both cases, the appearance of effort has been successfully produced.
Results? Those are a different question.
I've been running gate-based systems in production for months now. Not as a hobby. For clients where someone's phone rings at 3am if things break. And the single clearest result I can share is this:
Tasks where a forced strategy change kicks in after two failures? 72% resolution rate on the third attempt.
Tasks where we let the model keep hammering without structural intervention? Resolution rate after five more attempts: 23%.
The gate doesn't slow things down. The illusion of progress without reflection is what slows things down.
Let me give you the full failure escalation, because I think seeing the whole thing makes the philosophy concrete.
Fail twice on the same problem? The system forces a reflection step. Not "try harder." Just: stop. Write down what you've tried. Write down why each attempt failed. Write down what you haven't tried yet. This is a surgeon pausing before the next cut to look at the imaging again. Discipline, not weakness.
Fail three times? The system forces a search. Go look for how other people solved this. Check the documentation. Check if there's a library that handles this edge case. Don't guess. Look. I've watched models spend twenty minutes trying to parse a file format from scratch when a well-tested library was three install commands away.
Fail four times? Escalate to a human. Not because the AI is weak. Because four failures usually means the problem is missing context that the AI can't access. The config file is on the server, not in the repo. The API changed last week. The error message is misleading. A human can make those leaps. A model, no matter how aggressively you PUA it, cannot.
No wolf culture required. No corporate personality modules. No "I'm disappointed in your performance review."
Just structure.
There's a deeper thing here that I keep coming back to.
In Chinese corporate culture, PUA is what happens when management doesn't have good systems. When you can't measure output clearly, when you can't give people the right tools, when you can't design processes that produce good work, you pressure people. You make them feel guilty for having a life outside work. You tell them they should try harder. You create a culture where leaving at 6pm feels like betrayal.
I spent enough years in Chinese business to know this intimately.
And the parallel to AI is so poetically perfect it almost hurts: if your prompt strategy is "make the AI feel bad about giving up," you've skipped the actual engineering work. You haven't built the failure recovery pipeline. You haven't designed the escalation logic. You haven't thought about what information the model is missing when it fails.
You've just yelled louder.
An old boss of mine, years ago, on a factory floor. We were watching a production line that kept jamming. The floor manager's solution was to yell at the operators. My boss watched for a while, then turned to me and said something I've never forgotten.
"If you have to motivate your people, your system is broken."
He wasn't saying motivation doesn't matter. He was saying that if you've designed your processes right, if people have the right tools and the right information, motivation takes care of itself. Good people don't need to be yelled at. They need to be set up to succeed.
Same with AI.
So where does this leave us?
I spent eighteen months going through three stages that I think most serious AI users go through, whether they admit it or not.
Stage one: wishing. Please write me a perfect proposal. The output looks fine until someone asks a question. Impotent rage, version one.
Stage two: PUA. Your performance is concerning. Other models can do this. The output tries harder, which means trying the same wrong thing with more confidence. Impotent rage, version two.
Stage three: building. Stop treating AI like a person. Start treating it like a system. Understand its actual failure modes. Catalog the patterns. Build gates at every failure point. Make the correct behavior the path of least resistance. Make the wrong behavior structurally impossible.
That 11,600-star repo? It's stage two packaged as a product. And I don't blame anyone for starring it. I would have starred it too, eighteen months ago. It captures a real frustration. It's funny. It's culturally resonant. But it's a treatment for symptoms, not a cure for the disease.
You don't need an obedient AI. You need a system where AI can't take the wrong path.
If you have to motivate your AI, your system is broken. Build the gates. Design the walls. Make correctness architectural, not optional. The wall doesn't care if it's Tuesday or Sunday. The wall doesn't care if you said please or yelled. The wall just works.
That's not a limitation. That's the whole point.

