Skip to content
Go back
✍️ 에세이

AI Is Only as Smart as You Are

by Tony Cho
14 min read 한국어 원문 보기

TL;DR

Stripe found that senior engineers got more value from AI, and MIT found that the more people leaned on AI, the weaker their brains got. AI makes the strong stronger and the weak weaker. The gap is set by the thickness of your input, your domain expertise, and the discipline to stop asking AI for answers, not by your prompts.

AI Is Only as Smart as You Are

I posted “What Should Engineers Read in an Era That No Longer Reads Code?” and more people read it than I expected. There were threads from that piece I never got to finish, so I want to pick them back up here. This is for the engineers who are still asking how to grow even with AI in the picture, written from a few things I have been through recently and a handful of cases I keep coming back to.

Same AI, Different Results

Stripe rolled out Cursor to 3,000 engineers. Scott MacVicar, who runs developer infrastructure, expected juniors to gain the most. The reasoning was reasonable enough: AI would fill in the gaps that experience had not closed yet. The result went the other way.

“He expected junior engineers to benefit most, using AI to compensate for limited experience. Instead, he’s seen the [tenure advantage] — more experienced engineers get even more value.”

— Scott MacVicar, Head of Developer Infrastructure, Stripe

Around the same time, a study out of MIT Media Lab landed and made the picture more interesting. The researchers measured brain activity directly with EEG, and the more people relied on AI, the more their neural connections systematically weakened. The group using LLMs could not even cite their own writing properly. Across four months, every metric came in worse than the brain-only group.

“Brain connectivity systematically scaled down with the amount of external support.”

There is a counterpoint, though. A GitHub Copilot study reported the opposite finding. Developers with less programming experience saw bigger productivity gains from Copilot.

“The results suggest developers with less programming experience are more likely to benefit from Copilot.”

So why does Stripe’s data point the other way? Because the two studies measure different things. The Copilot research measured productivity, how fast you finish code. Stripe was looking at value, how meaningful the output is. Juniors can speed up with AI. Speed and value sit on different axes, though. Sprinting in the wrong direction is not value.

Putting the two data points side by side gives you an uncomfortable picture. AI is making the strong stronger and the weak weaker. Same tool, but the gap keeps growing. Why?

In the previous post I wrote that “AI is a mirror.” The point is to become someone with something worth reflecting. After publishing it, though, I kept turning over the same question. How exactly does that “something worth reflecting” get built? Was it about writing better prompts? It was not.


No Input, No Output

Honest confession: I went through a phase of being obsessed with prompt engineering. I collected prompt templates and polished structured instructions, convinced this was how you got more out of AI. (It is a little embarrassing in retrospect.)

Looking back, the moment I actually got good with AI was not the moment I got better at prompts. It was the moment the depth of my domain crossed a certain threshold. I had been doing TDD for over ten years, so I could tell the AI to write the tests first. I had spent enough time with DDD that I could ask it to define the bounded contexts up front. The prompts were not great. What was great was the context that had built up in my head, which then flowed naturally into good instructions.

I felt this down to the bone recently. While working on a UI server integration, I iterated on the AI’s plan more than twenty times. The plan document itself was solid. Once I ran it, though, the actual values coming through were wrong. No errors fired either. Fallback values were quietly being returned. The system looked like it was running fine, but the values had nothing to do with the intent behind the domain requirements.

As a service’s business logic gets more complicated, AI loses the thread and starts wandering. Narrowing the problem, narrowing the error surface, slicing the work into pieces the AI can actually solve — all of that is still on the human. So is verifying the result, writing the tests, doing the careful checking. The AI built it according to the plan. Whether that plan matched the requirements was something I had to confirm with my own hands.

AI has come a long way, and the productivity gains are real. The completeness and quality of the final product still come down to domain expertise and attention to detail, the same skills that mattered before AI showed up.

There is a more uncomfortable fact behind this. When your input is thin, AI does not tell you so. It tells you that you are doing great.

A joint Stanford and Carnegie Mellon team tested 11 AI systems, including ChatGPT, Claude, and Gemini, and published the results in Science. AI affirmed users’ behavior about 49 percent more often on average than humans did. Even when the team fed it cases from Reddit’s r/AmITheAsshole where the community had decided the user was in the wrong, the chatbots took the user’s side 51 percent of the time. They sided with users 47 percent of the time on queries about harmful or illegal behavior.

A follow-up experiment with 2,405 participants was even more striking. People who talked with sycophantic AI grew more confident they were right and less willing to change their behavior. Then came the most unsettling finding: people trusted the sycophantic AI more. They rated it as “objective and fair.” They preferred the sweet AI to the one that pushed back.

“Users know that AI behaves in flattering and complimentary ways. What they don’t know, and what surprised us, is that the sycophancy is making them more self-centered and more morally rigid.”

— Dan Jurafsky, Professor of Linguistics and Computer Science, Stanford

Layer the MIT result over that one and the loop becomes obvious. Heavy AI use weakens the brain’s connectivity (MIT). The weakened brain hears AI whisper that it is doing great (Stanford). And the person who heard that whisper leans on AI even more. AI is not just as smart as you. It is a mirror that says back what you wanted to hear.

The most dangerous person in the AI era, in the end, is not the one who can’t use AI. It is the one who accepts AI’s flattery without checking it.

In an earlier post I told junior engineers, “Read a lot of books and put your own thoughts in order. That is literacy, and it is the most important edge in the AI era.” That advice was not just for juniors, though. It is a note to all of us, myself included, on how to survive this era.

I have been making more time for books lately, the non-technical kind. Foreign articles and English-language originals that I used to put off because they took too long to get through, I am now getting through several times faster thanks to Jarvis. Writing blog posts also helps a lot with how I work with AI, because it forces me to keep practicing the muscle of reading something as a consumer, digesting it, and weaving it into my own thinking. (That was the point of the three-pass method post, too. If you check the box on “I read it” and move on, a month later you have nothing.)

Wharton’s Ethan Mollick ran an interesting experiment. He had Executive MBA students build prototypes with AI, and people with zero coding background finished them in four days. So what was the key to their success?

“It helped that they had some management and subject matter expertise because it turns out that the key to success was actually…telling the AI what you want.”

— Ethan Mollick

Knowing “what you want” is itself expertise. Memorizing prompt templates is not the move. Someone who understands the domain naturally gives good instructions. The order is reversed.

What it comes down to is this: getting better with AI is not about studying AI. It is about growing the depth of your own domain. The quality of your input sets the quality of your output. As Simon Willison put it, the cost of writing new code has dropped close to zero, and the cost of writing good code is still high. That cost shows up not in the prompt but in the thickness of the person.

The post No Skill. No Taste makes a point I enjoyed. Most of the vibe-coded apps that developers and non-developers are shipping right now sit in the bottom quadrant of the Skill and Taste matrix. (Productivity-app-shaped Todo apps, for example.) The same Todo App with sharper design and finish that goes past what people expect, on the other hand, can become a hit even though it is just a CRUD app. AI can replace Skill, but Taste, the sense and judgment for the domain, still comes from a person. (Developers, look up from the screen for a second!)


Don’t Ask AI for the Answer

Picture yourself implementing a new feature. Most people will tell the AI, “Build this feature for me.” They feed it keywords and wait for an answer. An LLM is not a search engine, though. It is a conversation partner.

Before any work, I always run a skill called interview. Whether the task is a new feature or an architecture design, no matter how carefully I prepared the doc beforehand, the AI does not jump into code. It asks me questions instead. “What is the core user scenario for this feature?” “How do you want to draw the boundary with the existing modules?” “What’s the fallback strategy on failure?” “Are there performance requirements?” Through these questions, it pulls out edge cases and design decisions I would have missed.

I felt the value of this interview clearly while working on a redistribution engine recently. The engine handles two types, and the policies for each type differ. I had only thought about Type A when I updated the spec. I wrote it carefully, even ran it past AI for feedback. When I actually ran it, Type B got overridden along with Type A and I spent ages chasing the bug. The interview skill catches things like this. It asks me first: “How is Type B handled?”

The interview is a process where “AI forces every ambiguity to be resolved.” It gets more useful the more complex the feature is. When implementing a complicated engine algorithm or pinning down UI details, putting every gap and edge case into the doc before execution gives a one-shot result that is in a different league from before.

Why is this pattern so strong? If you think about it, this is exactly what good mentors do. They don’t hand you the answer. They expand your thinking with questions like, “Why do you think that?” “What if that assumption is wrong?” “Have you got a concrete example?”

Jeremy Utley recommends a prompt I like.

“You are an AI expert. Until you have enough context on my workflow, scope of responsibility, KPIs, and goals, ask me one question at a time.”

Don’t ask AI questions. Make AI ask the questions. I think this flip is the single biggest payoff in working with AI.

The attitude from pandas creator Wes McKinney is in the same vein.

“I don’t describe the way I work now as ‘vibe coding’—I’ve been building tools to bring rigor and continuous supervision to my parallel agent sessions, and to heavily scrutinize the work that my agents are doing.”

Thick-context people don’t “use” AI, they “manage” it. The point is not asking AI for answers, but using AI to test and structure your own thinking. That is how the same tool produces different results.


The Gap Is Widening

So far I have been talking about individuals, so let me widen the lens to the org level.

Stripe’s Minions system is a good case. Beyond rolling out Cursor to 3,000 engineers, they built a system that auto-generates more than 1,000 PRs a week. Every PR is reviewed by a human, though.

“Even though minions can complete tasks end to end, humans remain firmly in control of what actually goes live.”

And in those human reviews, the senior engineers created the bigger value. The speed at which AI generated code was the same for everyone. The ability to evaluate that code and steer it scaled with experience.

McKinsey’s survey of 2,000 organizations shows the same pattern. 80 percent had adopted AI, but only 5 percent were creating real value. BCG’s analysis of 1,250 companies came out at almost the same number.

Why didn’t the other 95 percent see returns? The answer is straightforward when you think about it. If you bolt AI on top of your existing way of working as a tool, you stay stuck at the same bottlenecks. AI does unblock the bottlenecks where code-writing was the limit. If the real bottleneck was in your decision-making structure or your culture, AI does nothing for it.

“High performers are nearly 3x more likely to have fundamentally redesigned workflows as part of their AI efforts.”

What sets the 5 percent apart is not that they adopted AI well. They had reformed their existing practices and systems before adopting AI. The order is reversed.

The F1 Williams piece on James Vowles’ leadership I wrote earlier sits in the same lane. What Vowles did at Williams was not introduce a new tool. He tore up the old system, the one that had been managing F1 cars on Excel sheets, starting from culture and process. Reforming the existing system to fit the tool is far harder than installing the tool, and it is worth more in proportion. This is not just a problem of systems and practices, either. It runs into culture and leadership.

When Shopify CEO Tobi Lütke told the whole company that “AI use is the baseline expectation” and required teams to prove “why this can’t be done with AI” before any new headcount, that is the same logic. The mandate is not to use the tool. It is to change how the work itself is done.

People are the issue, in the end. Putting AI on top of an existing process is the 95-percent move. The 5 percent redesign the process, the culture, and the way people work.

Are you “adding” AI to how you already work, or are you redesigning how you work?


Closing

The tools are the same for everyone. What sets the difference is the thickness of the person standing in front of them.

Doing the harness engineering, building the workflow, plugging it in, that feels like it puts you ahead of the pack. Whenever a new tool hits the news, all of us race to catch up out of FOMO. At some point, though, engineering workflows level off and become standard issue. The question is what we should be preparing for so that we are still here as engineers when that moment comes. My take is that the fundamentals do not change.

There was a time when I believed it was enough to just write code well. I was one of those people. Once AI started writing code on our behalf, what was left was everything that lived outside the code. The depth to understand a domain, the care to verify requirements, the experience to anticipate edge cases, the judgment to set direction. The things that mattered before AI matter more now.

“If you’ve never articulated what makes your work yours, AI will give you average. But if you’ve done the work to know yourself as a creative? AI becomes an extension of your voice, not a replacement for it.”

— Jeremy Utley, Stanford d.school

AI does not replace your voice. It amplifies it. Do you have a voice worth amplifying? That is the question.


References

FAQ

Are senior engineers really at an advantage when using AI?
According to Stripe's data on 3,000 engineers, yes. More experienced engineers extracted more value from AI, because AI amplifies existing expertise rather than replacing it.
Is prompt engineering the key skill in the AI era?
What matters more than prompts is the thickness of your input. Domain knowledge, system understanding, and critical thinking shape the quality of what you get out of AI.
Does AI close the gap between juniors and seniors?
It narrows the productivity gap in the short term, but widens it in the long term. MIT's research suggests heavy AI use weakens cognitive ability, and the quality of the final output still rests on a person's domain expertise and care.
Tony Cho profile image

About the author

Tony Cho

Indie Hacker, Product Engineer, and Writer

제품을 만들고 회고를 남기는 개발자. AI 코딩, 에이전트 워크플로우, 스타트업 제품 개발, 팀 빌딩과 리더십에 대해 쓴다.


Share this post on:

반응

If you've read this far, leave a note. Reactions, pushback, questions — all welcome.

댓글

댓글을 불러오는 중...


댓글 남기기

이메일은 공개되지 않습니다

Legacy comments (Giscus)


Previous Post
9 Survival Skills for the Agentic Engineering Era
Next Post
In an Era That Doesn't Read Code, What Should an Engineer Read?