Flowkater.io (English)

Books I Read in Q1 2026

Tony Cho (https://flowkater.io) — Tue, 14 Apr 2026 01:00:00 GMT

Opening

I'm pulling together the books I read this past Q1 in one place. I reached for whatever was around: fiction, classics, biography, self-help, startup, leadership, sport. Some I wrote long reviews of, others I covered in dedicated posts, so the lengths vary. I'm just logging them in no particular order.

Stoner, by John Williams

I actually read this one back in November but never wrote about it anywhere, so I'm putting it here. It was the first novel-as-novel I'd felt in a long time. Maybe the fact that it isn't dramatic is exactly why it lands with so many people. Following one man's life secondhand, I came away feeling like I'd lived through the whole sweep of frustration and constriction we all carry, and the everyday determination of someone who still gave each day his best. (The afterglow stuck with me long enough that I subscribed to Millie's Library for a month just to read critic Lee Dong-jin's review.) There's a reason so many readers call it the book of their life.

It's a quiet novel, walking through a life's events in sequence, but a few "villains" do show up. Not dramatic ones, just the kind of people who keep needling and obstructing him in an otherwise unremarkable life. What fascinated me was how closely those villains resemble the ones in our own lives. Nobody who's done you wrong enough to deserve being killed off, just the irritating people you meet along the way, captured with uncanny precision.

And like most of us, Stoner never once defeats them. I'm still recommending what sounds like a soul-crushing novel because I've rarely encountered one that renders a single life so close to the skin, and I want you to feel the afterglow of walking with him from beginning to end.

I'll defer the formal introduction to critic Lee Dong-jin's recommendation video.

Don't try to win the war that's your whole life. Try to win the daily battles. Stoner was that kind of person too.

Lee Dong-jin (translated by author)

The Obstacle Is the Way, by Ryan Holiday

I picked this self-help book back up after a long break, planning to read it together with George and Ellie. Truth is, it's not a great book. The one redeeming function: that signature self-help lightness was unbearable enough that I went and read Meditations, the Stoic classic the book keeps quoting.

The full review is here.

Meditations, by Marcus Aurelius

I read this in small daily doses, almost the way you'd read scripture. If I May Be Wrong (which I love) is essay-shaped, this one sits closer to scripture. Reading it while remembering that Marcus was talking to himself makes it surprisingly compelling.

I'd wanted to read this classic for ages, and The Obstacle Is the Way finally pushed me to it. The passages where he scolds the part of himself that wants to stay safe under the covers (using these almost dogmatic lines) are the highlight. Whenever life gets hard down the road, I think I'll find comfort in the lines I underlined in his diary. I read with a pen, sometimes switched to audio, and I plan to gather the passages I marked into a separate note.

"Everything depends on how you take it" means that the character and impact of any external thing or circumstance (anything value-neutral, with no inherent connection to happiness or to good and evil) depends not on the thing itself but solely on how a person receives it. (translated by author)

When daylight comes and you don't want to leave your bed, tell yourself: "I'm getting up to do the work of a human being. I was born for this work, came into the world for it, and now I'm complaining and balking? I wasn't born to lie under blankets enjoying warm comfort." (translated by author)

Source Code: My Beginnings, by Bill Gates

I bought this the moment it came out last year and only got around to it almost a year later. I'd absorbed the Microsoft founding story through other media (Pirates of Silicon Valley, the 1999 docudrama, and so on), but reading it in his own words gave me a much more granular account of everything up to the founding than the version I'd vaguely held in my head.

This is part one of his life. The main material is his childhood, before Microsoft. Part two will probably be the rise of the Microsoft empire, part three the post-retirement years. It has a different feel from Walter Isaacson's Elon Musk and Steve Jobs, but peering into his unusual childhood was its own kind of fun.

(Of course his scandal broke right as I was reading this, which made writing about it feel awkward. Still, I read it, so a quick note.)

How long do you have to focus and grind, day after day, just to do a little better than yesterday, for how many years, before you reach the top? (translated by author)

Bird That Drinks Blood (first half), by Lee Yeongdo

I cried while reading Bird That Drinks Tears during a high school class. I think it was sophomore year, and the scene I read at the very end is still with me. I kept meaning to read Bird That Drinks Blood and never quite got there until the 20th-anniversary edition came out (honestly I wanted the Tears edition), at which point I bought the whole set and started chipping away at it last year, picking it up in pockets through Q1.

I still have a few volumes left, but it has far more characters and factions than Tears did. If Tears is The Lord of the Rings (specifically the Fellowship), Blood is closer to A Song of Ice and Fire (Game of Thrones). (In the ensemble-drama sense.)

The variety of character charm and the world-building solidified across both books reach a kind of essence here, and there are scenes that genuinely make you marvel. There are still sentences that hit you in the gut, and a few that feel a little juvenile, maybe because the book has aged. (Or I have.) I'll still finish it and write up the rest in next quarter's post.

The one who falls from the horse is the one who rode it. The defeated general is the one who went to war. The drowned Lekon is the one who went into the water…. Every loser is someone who kept winning right up to the moment of defeat. Life is a long journey toward losing. Life isn't to be spent on winning; it's to be spent on losing. (translated by author)

Battlegrounds, Onto a New Front: The Second Krafton Way, by Lee Gi-mun

My first impression after reading the original Krafton Way was: "they got really lucky with how PUBG came to be." How Bluehole (Krafton's earlier name) was founded, the trial and error that led them all the way to PUBG: the story itself was riveting, but I felt I had no real lesson to take from it.

This second book was different. They'd hit a global home run with PUBG, and now the book is full of the messy, real work of turning that one-time success into a sustainable organization, company, and service. Unlike the first book, much of this one left a deep impression on me.

They had to walk through a grueling, drawn-out process to turn a globally successful service into something that lasts, not a one-off jackpot. Reading along, I felt the strain with them, and at the same time I found myself envious of a team this aligned, pouring their lives into the work to push the service forward.

The dialogues at the center, between Chang Byung-gyu and Kim Chang-han, are sharp to the point of "is this even okay?" Watching leaders of a company at this level keep clashing this hard, studying the org, trying to talk to their people (one-sided as it sometimes is), I kept asking myself: did I ever put that much effort into the orgs I was part of? It triggered a lot of reflection.

I'd guess this book was Chang Byung-gyu's idea. It's worth recommending because beyond the behind-the-scenes of building and shipping PUBG, you get a lot of the work of building org culture (and the difficulty of it), and the agonizing behind various business calls, conveyed without much editing.

I hope Krafton doesn't end up as a PUBG one-hit wonder, and that they keep growing through diverse IP and game titles. (Since Krafton holds the Bird That Drinks Tears media IP, as I mentioned, I'm hoping for a Korean Witcher.)

Within PUBG, no one had seen Kim Chang-han stripped down to the mad-dog persona. Kim Chang-han didn't carry the substance of his negotiations with Bluehole's leadership, or the friction they caused, into PUBG. What he looked like outside PUBG, no one inside knew. (translated by author)

A person with ownership over something owns the delivery of the message too. Let's not say "I sent it; they didn't receive it." If we start passing responsibility around like that, we don't get to an answer. Before debating whether the means are efficient, the person with ownership, whether of communication or of work, has to take responsibility until the message lands. (translated by author)

Korean developers got into development as a job. Passionate people are the ones who should be diving in, and I'm not sure how many of those we have in Korea. (translated by author)

When the entire dev team stops developing and starts playing the game they made, that's the launch date. (translated by author)

Lost and Founder, by Rand Fishkin

Some VC trashed it on Threads as "garbage" and mocked the author's "weak" mental state for wallowing in self-pity, which made me want to read it immediately. The author founded the SEO startup Moz, and the book covers his very naive early days in business, his regret about not exiting after VC funding turned things around, and a critique of the growth-hacking and J-curve gospel preached in the Valley (and the broader startup scene). It tells the unedited version of the story you don't get from Facebook or the small handful of breakout success stories.

These days, thanks to AI, there's a growing belief that you don't necessarily need VC funding, and bootstrapping is part of the conversation. But back when I started in the early 2010s, every startup's top mission was to take VC money and become a unicorn. Mine was no exception, even though not every business can be a unicorn.

There are companies that use VC funding to run on losses while chasing explosive growth, and there are companies that build cash flow in a niche market and grow steadily. If you're thinking about a startup, this book is worth reading at least once. It'll help you weigh what you actually want, what kind of business growth fits you. His mental anguish, and his stage-by-stage regrets and advice, meant a lot to me.

If I'd read this ten years ago, would I still be running my own business today?

The moment you believe you can hide the truth, the brake that stops you from doing bad things is broken. (translated by author)

A founder's traits get embedded in the organization almost permanently, while employees' traits change over time. Part of the reason is that founders stay much longer and exert influence for much longer. But even when the team turns over, what remains is an indelible imprint left by the founder's biases, the business structures they built, the way they hired, the way they delegated, how they allocated resources, their passions, and their blind spots. I see this pattern repeat in companies of every size, industry, and configuration. The founder (and CEO) governs not just the personality and culture of the org, but also the foundational strengths and weaknesses that shape the organization's trajectory for years or decades. (translated by author)

The Score Takes Care of Itself, by Bill Walsh

This one is a pretty famous leadership book in the US. So many people recommended it that I read it in the original English, carefully underlining as I went, with an AI agent making the process much easier.

The full review lives in this post.

Formula One, by Joshua Robinson and Jonathan Clegg

This is one of the books I bought and finished fastest. Inside you get the F1 industry titans I'd only heard about (Enzo Ferrari, Bernie Ecclestone), the championship drivers from Niki Lauda, Ayrton Senna, Michael Schumacher, through Lewis Hamilton and Max Verstappen, plus the major chapters of team history and the stories behind them. Built from years of reporting by Wall Street Journal writers, the book reads as both a tribute to F1's past and a mix of affection and worry about today's American-market-centric era.

I came into F1 through Netflix's Drive to Survive season 1, watched on and off for a while, and only started catching every race last year. The book retraced the older F1 history for me and unpacked even the famous events I already knew with great backstage detail. Last year I followed every race closely and traded paddock news and driver memes with Ellie, so this year's new Drive to Survive season felt thin to me. (Documentaries always aim at total newcomers.) That disappointment is part of why I wrote a separate post on F1 leadership.

Then, after the first three races this year, the war in Iran caused two April races to be cancelled. That doubled the disappointment. This book filled the gap. After Liberty Media bought the sport and the Netflix doc dropped, F1 entered a completely new era. A race is, at heart, a game to decide first place. But racing has become a status sport, and the fact that drama from mid-pack and back-of-the-grid teams gets consumed alongside the title fight is genuinely interesting. As the book points out, there are now people who call themselves F1 fans without watching a single race. (I was one of them up to two years ago.)

If you're curious about F1, watch the Netflix doc; if you came in through the doc and want to know F1's older history, this book is exactly right. The Miami Grand Prix is two weeks away, so go buy it now and read it.

In that sense, F1 had become a kind of "post-sport." We're now in a world where you can call yourself a die-hard Formula 1 fan without watching a single race. They're full-on supporters who pour their lives, attachment, and money into F1. (translated by author)

Principles, by Ray Dalio

This is the famous book by investor Ray Dalio. I read about 60% and put it down. Part 1 was a fascinating secondhand tour of his life, but parts 2 and 3 are quite literally a list of "principles," which threw me a bit.

These days a self-help book written this way would probably get torched. But it came out seven or eight years ago when plenty of people around me recommended and raved about it, so I'd had it for a long time, sitting like a homework assignment. Reading it now, the rapid-fire list of principles in parts 2 and 3 felt off. I don't lean toward this kind of structure, and I'm allergic to the self-help genre to begin with. Setting aside the fact that he's a successful investor, I'm not sure the book deserves the praise it gets.

That said, his principles aren't unreasonable. They're decent advice. The problem is that for any of these messages to actually land in my own life, they need to leave an impression, and they didn't.

The book is very thick. If you're just curious about who Ray Dalio is, part 1 alone is enough; his story carries it. Imagine if they'd split it into three volumes and sold part 1 separately. Some readers call it the book of their life, so trying it once and forming my own opinion was at least worth something.

What you do after failing matters most. Successful people grow by leaning into their strengths and shoring up their weaknesses; people who fail can't bring themselves to do that. (translated by author)

Closing

<div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 1rem; margin: 1.5rem 0;"> <figure style="margin: 0;"> <img src="/assets/2026-q1-books.jpg" alt="Onyx Palma 2" style="width: 100%; height: auto; margin: 0;" /> <figcaption style="text-align: center; font-size: 0.875rem; opacity: 0.7; margin-top: 0.5rem;">Onyx Palma 2 e-reader</figcaption> </figure> <figure style="margin: 0;"> <img src="/assets/2026-q1-todait.png" alt="Todait reading plan" style="width: 100%; height: auto; margin: 0;" /> <figcaption style="text-align: center; font-size: 0.875rem; opacity: 0.7; margin-top: 0.5rem;">My Q1 reading plan, managed in Todait</figcaption> </figure> </div>

For the actual reading I mostly used Todait, which I'm building (again) right now. Even with three books in flight, the per-book pacing balances itself, so I worked through them one chunk at a time. Reading multiple books at once turned out to be more chaotic than expected. If work gets busy and I miss a few days, the backlog stacks up that much, and I'm thinking I should cap it at two books in flight.

I started reading in earnest in February. Roughly one book per week (and we're already in mid-April). Reading a lot is good, but I want to keep stacking reviews and leave at least rough notes for thinking. Some of these books I read a while ago, and not having taken notes at the time made it harder to write from memory.

Half were physical books, and for the other half I bought the Onyx Palma 2 (a phone-sized e-reader) so I can read anywhere. Lately I leave my smartphone outside the bedroom and bring only the e-reader and whichever book I'm in. It's helped with sleep and with reading both.

I still have plenty of unread books stacked up. To keep the reading habit going, I'm leaving a quarterly note. See you in the next quarter.

Reading 'The Score Takes Care of Itself': On Leadership

Tony Cho (https://flowkater.io) — Mon, 13 Apr 2026 13:00:00 GMT

Opening

Whenever leadership comes up, people ask what success I've had to speak this deeply about it. The question is genuinely embarrassing, because I haven't had any success in leadership. I was always the loner in whatever organization I joined, and from the days I ran my own startup to the most recent team, I have never once felt like I 'truly' belonged. When I was younger that was arrogance and self-delusion ("I'm different"). More recently it's been my own laziness; I gave up trying to understand other people. And on top of that, I watched with my own eyes as the standards and principles I'd set, and the consistency I'd tried to keep, collapsed underneath me, and my sandcastle leadership went with them. (I don't want to plead force majeure. I picked that team in the end.)

The small comfort is that out of every person I've worked with across my career (myself included, every boss above me, every successor below me) not one of them practiced anything I'd call real leadership. (Not one.) By my standards, most of them were closer to the worst end of the scale, and the funny part is I think I was at least a notch better than them. I still think so. (They'd disagree, of course. This is just my take.)

A leader needs to delegate to the right people on the right things. But naive delegation without trust is just laissez-faire dressed up as empowerment, and the 'fake' psychological safety you create by comforting people who failed is nothing more than my own small selfishness, wanting them not to be uncomfortable around me. The opposite is just as bad. When you can't trust them and you pull every load yourself, the team never finds their own role, never grows or stretches in the right ways, and slips into the assumption that as long as they clock the hours, the leader will cover the rest. The team's win is no longer their concern. They just want the company to keep paying salaries. So the question isn't whether to delegate. Every team is different, every person is different, and calibrating the dose is genuinely hard work. There's no leadership law that holds everywhere with the same shape.

I used to read everything on this. What makes a good team, why 'psychological safety' matters, why a winning team matters (which I wrote about three months ago in "Organizations that don't win have no future"). At some point I realized how lazy and convenient it is to expect a book to hand you the answer. So I put all those leadership books down. Reading does sometimes give us unexpected lessons and experiences, but on the topic of leadership specifically, I'd dare to say this:

Close the book. Spend that time on a 1:1 with the people on your team right now. Then ask yourself whether what you're doing is actually moving the team toward winning.

So the leadership fragments I share with people come from many places, but they almost always trace back to the same wounds: losing the details, falling out of alignment with the leadership above me, missing the small shifts in my teammates, failing to define what winning looks like. It's less success than the residue of a long string of bone-deep failures I felt with my whole body during sleepless nights.

So this post isn't a summary of Bill Walsh's principles. It's a record of the places I'm still failing. It's closer to a notebook of the failures his book held up like a mirror.

Bill Walsh (1931–2007) is a football legend, and the protagonist of the cliché sports drama (a coach takes over a hopeless team, builds it into a champion, and keeps it there). He was the man who turned the dismal 49ers of the early '80s into a dynasty, and one of the great leaders of that era. Three Super Bowl wins in '84, '88, '89, and famous for being the first to bring the 'West Coast Offense (WCO)' to a team. As an NFL outsider, what I read is that football history splits into before and after WCO. (I have never properly watched a football game. You really don't need to know the rules to read this book.)

I don't know NFL well, but Walsh's leadership book is famous enough that I'd kept it on my shelf for a long time, telling myself I'd get to it. With no team members I have to do 1:1s with anymore, this felt like the right moment to look back at my own failures through it. Maybe the next time the chance comes around, I'll be a little better.

What makes this book really good is the gap between expectation and reality. You'd assume an all-time-great American football coach wrote a book of his triumphs, war stories, and the leadership rules they produced. But the title (The Score Takes Care of Itself) names the leadership principle Walsh truly chased, and also, in the end, the principle he himself failed to keep. The kick of this book is in the second half. In the first half you underline his leadership principles one by one. In the second half the book changes character entirely. I'll get into how it changes in the last section of the body.

He's a leader from a different era, a different country, a different field, and his success isn't comparable to anything I've done. But the fatigue, loneliness, exhaustion, and the open record of his mistakes and failures behind that success made the principles in the first half land harder than they would have otherwise. So this review ended up as the record of one person who keeps tripping over those principles, not as a tidy summary of them.

The score takes care of itself? Winning is still the metric

The Score Takes Care of Itself. When I first saw the title I was on guard. About three months ago I'd written a post called "Organizations that don't win have no future." I quoted "Winning solves everything," took it as my north star, and wrote pretty firmly that a leader who can't define winning for the organization eventually collapses. Then here was Bill Walsh standing in what looked like the opposite spot. Don't chase wins. The score takes care of itself. Coming from the legendary coach who lifted three Lombardi trophies, the odds my view was wrong felt much higher.

To get straight to the conclusion, no, I wasn't wrong. Walsh and I were looking at the same point from opposite sides. That came into focus near the last page.

In the chapter "The Prime Directive Was Not Victory," Walsh says it like this:

"What it was, instead, was a comprehensive standard and plan for installing a level of proficiency — competency — at which our production level would become higher in all areas, both on and the field, than that of our opponent. Beyond that, I believed the score would take care of itself."

— "The Prime Directive Was Not Victory" chapter

And in the same chapter he gets a little more honest:

"I directed our focus less to the prize of victory than to the process of improving — obsessing, perhaps, about the quality of our execution and the content of our thinking; that is, our actions and attitude. I knew if I did that, winning would take care of itself, and when it didn't I would seek ways to raise our Standard of Performance. At least that was my plan."

— same chapter

What stopped me was that last line: "At least that was my plan." Walsh did not put winning aside. He chased winning by not openly chasing it. He obsessed over standards and process, and when that obsession wasn't working, he says he'd find a way to raise the Standard of Performance. In other words, his process was still being graded by win-or-lose feedback.

So here's how I sorted it: Putting winning first does not mean ignoring process. It's the opposite. The principles inside your process have to be evaluable as 'winning.' A standard with no evaluation is a hobby. A process with no evaluation is self-comfort. Walsh could obsess over proficiency the way he did because the football world stamps a clear number on a scoreboard every Sunday. With wins and losses arriving every week, he couldn't escape winning even without naming it out loud.

Business doesn't work like that.

Aligning a whole business team around 'winning' is, honestly, a much harder job than football. We don't get a scoreboard every week. Our 'win' might be a quarterly metric, a product metric, or a strategy that won't be judged for three years. So if you let people focus only on process, no one ends up asking whether that process points anywhere. Each person works hard, but the directions diverge. (That was the "organization that hasn't defined winning" I wrote about three months ago.)

Which means in a business org, having 'winning' get evaluated still matters. Even when you obsess over process the way Walsh did, the leader has to take responsibility for aligning what that process points toward. Saying "the score will take care of itself" inside an org where that alignment doesn't exist looks, to me, like dodging the responsibility.

I think Walsh's line should be read this way: Don't give up the daily standards that point toward winning, not give up winning itself.

In the chapter "How I Avoid Becoming a Victim of Myself," he repeats the same idea from another angle.

"The key to performing under pressure at the highest level, regardless of circumstance, is preparation in the context of your standards of performance and a thorough organizational embrace of the actions and attitudes contained in your leadership philosophy."

— "How I Avoid Becoming a Victim of Myself" chapter

I was at "An organization that can't even define winning falls apart." Walsh, on top of his already-defined winning, was showing in his book how to live each day. When he wrote about "standards of behavior" and "organizational embrace," I lost count of the times I'd failed to embed those standards in the orgs I'd led.

Good talent, bad attitude: I should have cut sooner

A person with good talent walks in carrying an attitude that doesn't match the team's standard.

Walsh tosses the line into a parenthetical.

"Whatever your specific job description, it is essential to our team that you do it at the highest possible level in all of its various aspects, both mental and physical (i.e., good talent + bad attitude = bad talent)." — "Standard of Performance" chapter

The parenthetical was louder. It's one of the lines that made me put the book down for a long time.

Honestly, I learned this much later than Walsh did.

I worked with several people who kept building real depth in their craft but were terrible at collaboration and communication. There was a stretch when I got swept up in their talent and convinced myself their role still mattered. If this person leaves, this part collapses. There's no immediate replacement. People might change with time. I kept feeding myself those excuses. I let the standard blur, I shrugged my shoulders, I went around explaining to the rest of the team that "that's just how they are."

Now I'm sure. If a person doesn't carry the kind of attitude that lines up on the same side as the team's standard, the right move is to cut them as soon as possible.

Walsh is brutal on this point.

"Individuals who fell short of the standard in various ways were usually quietly removed, and those who challenged my authority did so at their own peril." — "The Prime Directive Was Not Victory" chapter

"Quietly removed." No drama, no slamming the door, no public showdown. Just quietly, but firmly, out. The view from someone next to Bill makes it sharper.

"Bill was smart enough and willful enough that even a talented person who contributed to a negative organizational culture — who wasn't a team player — would be let go." — "Problem Solver" chapter

Letting go wasn't itself the choice. The choice was not keeping them around for the sake of talent.

The most striking line was elsewhere. In "Seek Character. Beware Characters," Bill turns the lens on himself.

"I tolerated this longer than I should have because of his talent. But his play, in that state, was a long way short of what it could have been if his head had been in the right place."

Even Bill did this. The man with four Super Bowl rings and a Hall of Fame jacket carried someone "longer than he should have" because of talent. And he wrote it down as regret. In one line. I read that sentence over and over. I, too, carried things I shouldn't have because of talent, and the team paid the price for it.

He hammers the point one more time in "Big Ego."

"The damage that the arrogant egotist inflicts on an organization is always greater than the benefits he provides."

Always. Coming from a writer who doesn't reach for that word easily, the sentence sits heavier. Not 'sometimes the harm wins.' Not 'on rare occasions the benefit loses.' Always the harm wins.

What I'd actually been trapped by wasn't the talent. It was the fear of losing the talent. The anxiety of "this person can't leave" kept eroding my standards. The cost of that erosion fell on the rest of the team. While the talented-but-toxic person stays, the team's "attitude standard" is set by them. The people with the right attitude start to wonder if they're being too rigid. That's the scene I saw too late.

Cut sooner. That's where I land now.

Teaching is the definition of leadership, and I have no patience

In one of the most central spots in the book, Walsh nails it down.

"Leadership, at its best, is exactly that: teaching skills, attitudes, and goals (yes, goals are taught and defined) to individuals who are part of your organization. Most of life — running a family, educating a child, running a company or sales team, coaching an athlete — requires good teaching." — "Teaching Defines Your Leadership" chapter

Not a leadership technique. The definition of leadership is teaching. Not strategy, not vision-setting, not decision-making. Teaching.

He calls teaching the team's top priority.

"A Super Bowl championship (or attainment of the number-one ranking in the marketplace, hitting a meaningful quarterly production goal, or signing a big contract) occurs because the entire team not only does its individual jobs, but also recognizes that those jobs contribute to the overall success." — "The Top Priority Is Teaching" chapter

"And this organizational recognition that 'success belongs to everybody' is what the leader teaches." — same chapter

"Failure also belongs to everyone. If you or someone on your team 'drops the ball,' everyone is responsible." — same chapter

He sums up his life like this near the end.

"Looking back, the lesson I would draw out is this: if you don't love it, don't do it. I loved it. Teaching others how to dig deep so they could realize their full potential, how to be great." — "The Thrill of Teaching" chapter

Don't do it if you don't love it.

For a long time I believed I loved teaching. When juniors or new hires joined the team, I organized study groups to pull them along, taught the material myself when I had to. I think I put real effort into onboarding, code reviews, 1:1s. I always wanted to be a good coach, a good teacher.

The problem was speed. Each person needed a different amount of time before what I taught showed up in their work, and I didn't tolerate the gap well enough. When the same problem turned up in the second review after I'd already explained it, my temper rose. By the third round my tone showed it. By the fourth I escaped into "it's faster if I just do it myself."

I wanted to be a good teacher, but I was less of one than I thought. Loving teaching and putting enough patience into teaching are completely different things. I couldn't tell those two apart for a long time.

But the deeper sting wasn't on the junior side. The second failure was with the senior people.

I decided I had nothing to teach the seniors. They work independently, I should protect their autonomy, they need to be self-driven to grow. Those judgments piled on top of each other and I effectively abandoned them. For a long time I called that autonomy.

Teaching didn't have to stay in the technical lane. Direction of work, organizational principles, decision criteria, communication posture: all of these were within reach for seniors too. If anything, the more senior they were, the more I should have shown up. I papered over it with "they'll figure it out."

In some ways that was fine. They built a lot on their own. But the moment I realized decisions that shouldn't have been made without me had already been made, and that I'd shrunk my own ground from inside, it was already too late. Abandonment wasn't autonomy. It was just me dodging the work of teaching.

Walsh writes:

"Whether it's a 350-pound tackle, an employee, or a child, we must do all we can to encourage, support, and inspire. But ultimately — finally — people must do it for themselves." — "The Bubba Diet (Willpower Cannot Be Transplanted)" chapter

The line looks like it lets me off the hook, but it does the opposite. Walsh says "willpower cannot be transplanted" only after establishing that you must do everything you can to encourage, support, and inspire. The "ultimately, themselves" part only kicks in after you've done that everything. I skipped the prerequisite and pulled the conclusion forward. As an alibi.

He hammers it once more.

"Teaching and training others how to do their jobs." — "Unleash Mentors" chapter

That, he says, is what a leader should ask of mentors. Which means the starting point has to be the leader being a teacher first.

What about me now?

In mentoring with George, in collaborating with Ellie, my patience is still short. The thoughts and ideas in my head are already several steps ahead, and I see myself getting frustrated when the other person can't keep pace. After explaining once I want to wave it off with "okay, done." When the person asks again, I can hear my tone get a little sharp.

Maybe it's because I'm building product hands-on again. The person whose hands are moving has a fast head, and a fast head finds it easy to ignore the pace of the person next to them. But that's a reason. It can't be an excuse.

I see places where I've actually regressed compared to when I was managing as a full-time leader.

I thought stepping out of the leader role would make me a better person, but in some ways the opposite happened. There was a coat of patience that the manager position forced onto my shoulders. Once I took it off, what got exposed is that I'm not as decent without it as I'd hoped.

So lately I think about it this way. Before blaming the other person, I have to think about how I can deliver it better, sharpen it. Teaching isn't completed when the other person receives it. It's mine until I've delivered it well enough. I'm only now starting to grasp that.

The inner voice: what my black comedy planted

The most chilling line I've ever read in a leadership book is this one.

"The genuine inspiration, expertise, and execution that employees bring to the actual work is more often a result of what an inner voice is saying than what an outer voice shouts. Not the leader's pep talk. What that inner voice says, leader, you decide. The leader, at least the good leader, teaches the team how to talk to itself. The effective leader has a deep impact on what that inner voice is going to say." — "The Inner Voice vs. the Outer Voice" chapter

Not the pep talk. The inner voice. Not what gets shouted from the outside, but what your teammate says to themselves when they're sitting alone. That's what the leader plants. On the prior page he wrote this too.

"Leadership is expertise. It is not rhetoric, it is not pep talks. People follow those who have credibility and expertise — knowledge of the job — and who demonstrate an understanding of human nature." — same chapter

And on how he handled his own language:

"When I criticized someone or gave feedback, I didn't take a defeatist tone. I kept the focus on the moment, and didn't drag in days or weeks of bad play to construct an image." — "The Leverage of Language" chapter

Even when criticizing, he didn't pull the past forward. He stayed in the present. Walsh knew his words rode on his teammates' shoulders for days, weeks. He kindly wrote down the reason on the next page.

"If you're seen as a constantly negative person who only ever points and criticizes, the people around you will simply tune you out. Your ability to teach, influence, and drive improvement shrinks until it disappears." — same chapter

I couldn't lift my head reading this part.

In the early days, when our team collaborated with other teams, I'd publicly criticize the other team's not-particularly-startup-shaped responses and sloppy communication. When leadership made a decision I didn't agree with or didn't understand, I'd openly bash it in front of my team. I hoped it'd read as humor and as solidarity with the team, but that wasn't always the case in the end. The moment I noticed my own tone reflected back in my teammates' behavior, I tried to stop, but it was a little late.

Especially since my (self-proclaimed) black comedy was rooted in something hopeful: staring straight at the brutally frustrating reality and still believing we could do something inside it. That intent didn't really land. (Maybe what didn't land is the perfect black comedy(?).)

To be honest, I do think there was real conviction somewhere inside me at the time. The hope that we could build a better organization, that even on top of this obvious reality we could try something different. But what came out of my mouth wasn't shaped like conviction.

What I'd actually been releasing wasn't conviction. It was cynicism. That hit me bone-deep.

And cynicism is a far more contagious language than hope. By the time I tried to stop saying and doing those things, the words had already taken root in my teammates' mouths, in our Slack, in our meeting rooms. What was left in the space I tried to clear wasn't my intent. It was my tone. Walsh's line about the leader's words becoming the team's inner voice landed exactly here, and it hurt.

There's another memory layered on top of this. A story about the bottom 20 percent.

There was a chronic complainer (a real pro at it) on a team with a long history, and I never pushed back hard enough to stop the behavior. By the time I sat down for a 1:1 to try to win them over, it was already too late. That's when it hit me. Especially when you're higher up, you become the one who has to convince people, so you start looking for the positive over the negative, and at some point you've already started thinking that way yourself. But the more you hesitate to listen to what people are actually saying, the more that gap (my optimism vs. their pessimism) widens, until you reach a point of no return.

Walsh names the mechanism precisely.

"For reasons I never fully figured out, the complaints of the bottom 20 percent often overpower the positive enthusiasm of the other 80 percent. I always assumed it should be the opposite, but it's not. The whiners seem to wield disproportionate influence." — "The Bottom 20 Percent May Determine Your Success" chapter

I avoided that person's face back then. I knew how heavy the air got in meetings when they spoke, and I told myself "it's just that one person, the rest are fine." I was scared that if I started taking in the negative signal, my own optimism would shake. While I postponed it, their cynicism was quietly becoming the team's shared language. Walsh wrote this too.

"A leader who ignores this part of the organization — the 'bottom 20 percent' that holds special or supporting roles — is asking for trouble. When these people start feeling redundant, their grievances can spread through the whole organization like cancer through the body." — same chapter

Spreads like cancer. Reading the analogy, I saw the scene of my own public criticism a moment earlier and the face of that complainer overlap onto the same picture. The cynicism I sprayed from above and the cynicism I left untouched from below were running through the same vein.

A leader's language doesn't move in one direction. It falls from the top down and becomes the inner voice, and if you don't face the discontent rising from the bottom in time, it spreads back up in the same color. Sitting between those two flows are the few weeks or months you hesitated to listen. The gap widens by the exact amount of time you avoided listening, and the organization quietly buckles.

Upward leadership: why Bill spent a chapter on the owner

Open most leadership books and they're about leadership pointing downward. How to lead the team, how to teach them, how to motivate them. It fills entire bookstore aisles.

But upward leadership is something I've never properly learned anywhere. Bosses, boards, investors, owners. No one explains how to handle the people sitting above your head. As an individual contributor it's relatively simple, since there's only one boss. The moment you become a middle manager, that relationship becomes half of your job. A half that's sometimes heavier than performance.

So I was a bit surprised when I got to the chapter where Walsh devotes a section to owner Edward J. DeBartolo Jr. A legendary coach, in a post-retirement memoir, pours out criticism of the man who gave him his shot, to a degree that makes you wonder if he's allowed to say all this.

What's even more striking is that at the end of the long stretch of criticism, Walsh comes back around and reminds you of the gratitude for the chance Eddie gave him. The same Eddie who handed Walsh the leadership stage was also the cause of that leadership slowly breaking down. One person built and chipped at Walsh's career. He puts that duality right in the middle of the book, no covering up.

Walsh recalls his earliest days as head coach in the chapter "Autonomy and Authority."

"Equally important, he made it clear to everybody in the organization that I was the boss, and he wasn't going to undercut my authority. Without this authority and support my task would have been virtually impossible, given how dire the situation was."

Authority and support. Without those two words, the work of a middle manager is "virtually impossible," he says flatly. As you move toward the back of the book, you see what happens when that authority and support are slowly pulled back, and that's exactly what the chapter on Eddie is.

The title Walsh gives this situation is bleakly fitting: Ride It Out Until Help Comes; Hold On to Your Boss.

"You have to keep the people above you, who want immediate results, from acting rashly, while at the same time working with the people below so they don't quit or rebel."

The whole life of a middle manager fits in that one sentence. The top is impatient, the bottom is exhausted. You stand in the middle and have to hold both sides at the same time. Let go for a moment and one side falls.

His prescription is oddly tactical.

"I wanted the owner (and his advisors) to understand that I was making the maximum effort and paying attention to every small detail of the family's huge financial investment."

"He was a really great boss to work with in the early years when I was head coach and general manager. To some extent that was, I think, because my constant effort to keep him fully in the loop gave him a sense of relief."

And the famous line.

"Read or unread, dump documented information on your boss — projections, evaluations, progress reports, status updates. Then request regular meetings. Make it clear, in a very professional way, that the boss should understand you're doing everything you can and that it's all documented — in fact it's right there in the thick folder in their hand."

The advice from a coach who lifted three Lombardis turns out to be the practical tip of flooding your boss with information so they feel relieved. Read or unread, doesn't matter. The sense of a thick folder in hand calms the upward axis. At first it was practical to the point of being funny. By the end I couldn't laugh. That sentence is closer to a survival manual pulled from the most exhausting season Walsh lived through.

He adds one more line. Keep your eye on the ball.

"While placating the people who can determine your fate during a losing stretch or a turnaround — bosses, boards, shareholders — you must, at the same time, stay absolutely focused on what really matters."

Soothe the top. At the same time, stay absolutely focused on what really matters. Two commands sit calmly inside one sentence. Walsh says it's possible, but he also seems to know how cruel the demand is. A whole chapter is the record of that cruelty.

I overlaid the companies I've passed through onto the relationship between Walsh and the owner. The personalities differed, but the structure was similar. Upward leadership is at least as important as downward leadership, if not more. I've felt that to the bone, more than anything else. And ironically, the people who opened the door to my leadership and the people who became the reason that leadership broke were, in some sense, sitting in the same spot.

I can't put my specific experiences here. I don't think I should. But I'd guess that almost everyone working as a middle manager is wrestling with some version of the same dilemma somewhere. Absorbing the impatience of the top, absorbing the fatigue of the bottom, while their own exhaustion stays invisible to anyone.

Did I crash through, or did I give up?

In the second half of the book Walsh leaves this line.

"Looking back, I came to the conclusion that there are times you have to stand up for yourself even if the result is that you get fired. As proven by the fact that I didn't do it myself, easier said than done." — "Zero Points for Winning" chapter

This is Bill Walsh. The man who lifted three Super Bowls. Even Bill confesses, "I did not stand up for myself."

He also wrote this.

"Everyone has opinions. Leaders are paid to make decisions. The difference between offering an opinion and making a decision is the difference between working for a leader and being one." — "The Common Denominator of Leadership: Force of Will" chapter

"If I was going to come apart, I wanted to come apart for the right reasons. (…) The hardest thing — the unforgivable thing — is failing to admit your way was the wrong way and failing even when changing course is the only path to victory." — "Be Wrong for the Right Reason" chapter

Place these two next to each other and the chill sets in. Decisions, not opinions. If you fall, fall for the right reason. Did I actually live that way?

To be honest, no.

I genuinely believed certain approaches weren't the way to win, but in moments where I didn't want to take responsibility (or didn't even want to be the one accountable) I always compromised. I had opinions, I could even see the answer, and I stepped back because I didn't want to sit in the chair of decision. That wasn't yielding. That was running.

I don't want the cheap comfort of "well, that's office life." Because it's a lie. Those compromises eventually become wounds you carry. Those scenes don't get erased; they pile up quietly and one day come back as the question, "Was I really a leader then?"

In the same room where Bill Walsh confesses that even he couldn't stand up for himself, the only thing I can do is throw the same question back at myself, not make excuses.

"Did I really push back with everything I had? Or did I just give up?"

Everything has its cost

In the first half I kept underlining Walsh's principles. Standards, teaching, attitude, the inner voice, upward leadership. There was so much to learn that at some point I started wondering, "How did this one man have all of it?" Then in the second half the underlines changed character. I wasn't stopping at his moments of victory anymore. I was stopping at his moments of collapse.

Walsh warns first.

"The reason consecutive champions are rare at the top of competition is that a certain measure of success brings with it a kind of disorientation we're not prepared for." — "Winning Is Harder to Handle Than Losing" chapter

And one chapter later he adds:

"When things are going best is when you have the chance to be your strongest, most demanding, and most effective as a leader. A strong wind is at your back, but to keep that wind from knocking you over, you have to understand the dangers winning brings." — "Why Repeat Championships Are Hard" chapter

The dangers winning brings. That phrase stuck with me. We usually only think about the dangers of failure, but Walsh flips it. The truly dangerous time is when things are going well. And this warning isn't just strategic advice. It was something he was saying to himself, which becomes more obvious as you read on.

There's a chapter where he opens up his interior. The title is "The Perfection of the Puzzle."

"I didn't want to lose by 40 points. I'd prefer losing by 39. When we won by 20, I'd wake up in the middle of the night thinking hard about how we could have scored 21." — same chapter

A man who, on a night the team won by 20, wakes up to think about how they could have scored 21. That's Bill Walsh. Even in a victory, replaying through the night the reason they didn't score one more point.

He keeps writing.

"Did I miscalculate or ignore information I could have seen there? Why and where did our execution break down? Of our decisions, of my decisions, which were wrong or completely off? Endlessly, endlessly, endlessly. What I was pursuing, I think, was perfection." — same chapter

Endlessly, endlessly, endlessly. You can see how he lived in the way he repeats that word three times. And he diagnoses the root of his own perfectionism.

"All of this was less about scoring more points or losing by fewer than it was about how I had come to perceive the entire process of leadership and the effort to succeed. To me, it was a puzzle to solve, pieces to find and place, solutions to work out." — same chapter

The engine of his greatness was also his blade. He knew it best himself.

And then the heaviest chapter in the book arrives. "Zero Points for Winning."

"I gave myself zero points for winning.

Winning was nothing more than postponing the pain of losing. I'd quickly turn my attention to the next game, and the next, and each one offered nothing but a chance to push back the dread that comes with losing — without ever removing the dread itself.

When this happens, any defeat or mistake or setback becomes deeply unsettling, even destructive — because you've attached your self-image to the outcome of competition. Winning becomes harmful for the same reason. You're letting winning start to define your sense of worth, your feelings about yourself." — same chapter

I closed the book for a moment in front of this passage.

A man who lifted three Lombardi trophies confesses he gave himself zero points for winning. Those trophies weren't joy; they were painkillers that briefly delayed the dread of the next loss. And after the confession, one line: "Because you've attached your self-image to the outcome of competition."

I'd done that too. There was a time I dragged outcomes into my self-worth. When metrics went up I felt like a decent person; when they cracked I felt worthless. I didn't know back then that, lived that way, even winning is harmful.

Walsh also wrote about how the bill comes due.

"The volatility of the environment and the emotional drain can completely deplete you, and you live with it constantly. It can leave you very vulnerable, very weak." — same chapter

And he leaves himself a warning.

"Avoid the destructive temptation of equating your team's win-loss record with your own self-worth." — same chapter

You only feel the weight of this line after the confessions before it. This isn't theory. It's the last warning Walsh leaves himself, and the people walking the same road he walked.

He looks honestly at where his perfectionism began.

"In the early days I was the same. I firmly believed that if I did my job well, the score would take care of itself. When it didn't, I worked harder to improve coaching and raise the team's standard of performance. This was one of the reasons I drove myself so relentlessly." — same chapter

Even his philosophy of "the score takes care of itself," flipped over, was the engine that drove him relentlessly into himself. At this point I accepted that this book is less a leadership textbook than one person's confession.

The book's final chapter is written by his son, Craig Walsh. A chapter about his father. The title is "THE WALSH WAY: A Complex Man. A Simple Goal."

"He was a perfectionist, and perfection, he believed, was only achievable when his ideas and decisions were fully realized — not filtered through other people, who, in his view, would inevitably misunderstand and misapply them. He had to be the one in charge." — same chapter

A son's view of his father. Not warm, not cold, just as is. The son sums up in the shortest sentence why the great father couldn't let himself go to the very end.

And the book's final paragraph.

"My father is gone, but the leadership lessons he earned through blood and sweat remain. Maybe more meaningful now than they've ever been. I know he would want something he shared in this book to be of value to you in your own challenges as a leader. That would mean he could once again do the thing he loved and did so well. Teaching others how to be as great as they can possibly be." — last paragraph of the same chapter

I sat with the book closed for a while at this last paragraph.

It felt like watching one human's great and intensely personal history.

A book that started as a leadership book and ended as a life. A man who woke at midnight to think about scoring one more point on a puzzle he was solving. A man who gave himself zero points for winning. A man who, never having reconciled with himself, wrote it all down in a book and left. His son closes the book by carrying his lessons in the last chapter.

There was relief. Even a man whose success I can't compare myself to was like this. The huge shadow and the huge cost made my small failures feel a little less alone. So you went through that too, I thought.

There's also a cool warning in it. Considering the cost Walsh paid, what would I get from walking the same road, having not even brushed against his level of greatness? As long as I tie my self-worth to outcomes, winning chews me up and losing breaks me down. There are no exceptions. If even Bill Walsh wasn't an exception, I sure won't be.

So Bill restates the message he always pointed toward, even though he himself never fully reached it.

"The score takes care of itself."

Closing: beyond leadership, toward being a slightly better person

A few days passed after I closed the book.

The first thought was the same one I'd written in the opening. "Maybe the next time the chance comes, I'll be a little better." Next time, I'll put Walsh's underlines to better use. That kind of expectation.

But as I sat with this review and pulled myself apart section by section, that sentence started to feel a little suspicious.

Next time. Is that really what matters?

I was imagining a "next leadership opportunity" and quietly suspending the version of me that exists right now. Next time I'll do it properly. Next time I'll be the patient mentor. Next time I'll be a leader aligned both upward and downward. The vagueness of that "next" was building an alibi for today's laziness.

And the place I stopped longest in Walsh's book wasn't the strategy or the principles. It was this passage from "Quick Results Come Slowly."

"I believe this is true in your work as well. The effort at the start is part of a continuous effort, and your standard of performance is part of a continuous standard. Today's effort becomes tomorrow's result. The quality of that effort becomes the quality of the work. One day connects to the next, and the months connect to the years that follow."

— "Quick Results Come Slowly: The Score Takes Care of Itself" chapter

One day connects to the next. A separate "next chance" doesn't arrive. How I speak today, how I listen, how patient I am, how I push back: that exact thing rides forward into tomorrow. On the same page Walsh closes with this.

"Your own standard of performance becomes who and what you are. You and your organization achieve greatness." — same chapter

This sentence shook me most on this read. "Your own standard of performance becomes who and what you are." It's the last line of a leadership book, but once you've read it, it doesn't only sound like a leadership line. It reads as: the standard I allow myself today becomes who I am.

So what I have to fix isn't "the next-chance me." It's the me right now.

Beyond leadership, in a single line of feedback I trade with Ellie while building product, in mentoring sessions with George, in the brief conversation with someone I happen to run into at a café, the small refusal to stop becoming a slightly better person. Not the grand restoration of leadership, but that level of diligence.

Honestly, this is something I've already written about. Tucked away in a corner of the blog I once wrote a piece called "Be curious, not judgmental." It was a note to myself: don't judge people quickly, hold curiosity first and look once more. I'd long forgotten that note. The recent me has been judging first quite a bit. Getting frustrated. Reaching the conclusion before anyone else.

I think I need to take it up again. With more room, kind but rigorous, a life that keeps reaching for a little better. When Walsh writes elsewhere that "You Must Have a Hard Edge," he doesn't mean coldness. Kindness and rigor aren't opposites. I was simply short on both.

If I can become that person, the score will probably take care of itself.

Walsh's original line was about teams and organizations. As I closed the book I decided to take it home in a single-person size. If today's standards decide who I am, tomorrow's score will follow on its own, even when I'm not watching. At least, that's what I want to try believing.

The Moment You Spin Up an AX Team, You've Already Lost

Tony Cho (https://flowkater.io) — Wed, 08 Apr 2026 13:00:00 GMT

The Moment You Spin Up an AX Team, You've Already Lost

That Uneasy Sense of Déjà Vu

The moment a company decides to "do AX" by spinning up a dedicated AX task force, it has already failed.

That may sound harsh. But if you think about it for a second, haven't we all seen this movie before? The DX task force. The Cloud Migration Division. The Big Data Innovation Team. "We'll stand up a team and roll this out company-wide." How many times have we watched this? It was never going to work.

In part one I wrote that "installing Claude Code at your company doesn't get you AX." The diagnosis was that adoption and transformation are two different things. There was plenty of response, and the question that came back kept landing in the same place.

"So what should we actually do?"

I'm trying to answer that question this time.

According to MIT NANDA's research, 95% of enterprise GenAI pilots fail. Ninety-five percent. Near total wipeout. And yet the successful 5% shared a clear pattern. They weren't organizations led by a central AI lab. They were organizations where line managers on the ground drove adoption. The ones that survived weren't the ones that built a separate AI unit. They were the ones where the existing operational teams moved on their own.

If I'm being honest, I've seen this many times, and I've built one of these myself. The task force. The transformation division. That was hypocrisy on my part. It didn't work then. It won't work now.

Why doesn't it work? And what the hell are we supposed to do instead?

We're Supposed to Flatten Layers. We're Stacking Them.

The essence of AX is reducing layers. AI absorbs the middle of the work, shortening the distance between decision and execution. Creating an AX task force does the opposite. It adds one more layer on top of the existing ones.

The most dramatic case of this paradox is Coca-Cola's Project Fizzion.

Coca-Cola, a 139-year-old consumer goods company, went all-in on an AI-driven transformation. They co-developed Project Fizzion with Adobe and simultaneously pushed hard on an AI-generated holiday ad experiment. One of the oldest brands in the world declared it would lead the AI era.

What came back from the field first wasn't applause. It was pushback. The AI holiday ad drew heavy backlash. CEO James Quincey stepped down, saying in his own words, "someone with the energy to pursue a completely new transformation" was needed. Roughly 75 positions at the Atlanta headquarters were on the chopping block in the first round of restructuring.

The point here isn't that one ad failed. It's that a 139-year-old company pushed AX like a single project, and leadership plus the whole organization paid the price together.

By coincidence, around the same time, Australia's largest bank, Commonwealth Bank, hit the same kind of wall. This one was more naked. They laid off 45 customer service staff on the assumption that an AI voice bot would cut call volume. In practice, call volume went up. Managers had to start taking calls. The bank publicly admitted the misjudgment and rehired all 45. They'd also rolled out GitHub Copilot, got mixed results, and eventually pulled it. The union called it a full reversal. It took a month.

The Pentagon's CDAO was folded into R&E (Research and Engineering). The AI function that had been elevated as a standalone org was pulled back inside the research-and-engineering structure.

Coca-Cola, Commonwealth Bank, and the Pentagon are in different industries and sizes. What they have in common is that standalone AI units or central task forces either didn't work as hoped, or had to be redone.

Funny thing: Fortune reported in March 2026 that 76% of AI projects led by CFOs delivered "great value." And yet only 2% of companies had given the CFO an AI role.

Put those two numbers side by side and the paradox shows up. When the person who knows the reality of P&L and cost leads AI, the results land. But most organizations create a new title (CAIO) and hand it to a separate org.

Intel's CAIO left for OpenAI after seven months. Seven months. His business cards hadn't even worn out yet.

And this isn't just a problem inside AI orgs. It's telling that big-company CEO turnovers like Coca-Cola's James Quincey and Walmart's Doug McMillon are happening right as AI transformation enters its critical phase. AX isn't a project where you bring in one tool. It shakes leadership and the entire org design. The moment you create a separate org, the rest of the organization treats AX as someone else's problem.

That's the core of the AX task-force paradox. You're piling a new layer onto work that requires fewer layers, so it's structurally bound to fail.

The Limits of Tool-Talk: This Isn't an Efficiency Problem. It's an Identity Problem.

A lot of today's AX discourse stops at demoing tools like Claude Code or Copilot. Here's what's possible, isn't it cool, you can do AX too. The people writing those posts include startup founders, solo developers, and consultants.

I get it. They need to survive too. Tool demos look good. They get strong reactions.

But corporate reality is different. It's conservative and slow. And if you, as the person in charge of AX, catch yourself asking, "why aren't they using this great thing?", your AX is basically over.

The problem isn't tools. It's people. More precisely, it's a question of how each person understands their own job.

People treat their role as their identity.

"I'm a product planner."

"I'm a marketer."

"I'm a backend engineer."

What AX asks for is the dismantling of that identity. A significant chunk of my job gets taken over by AI, and I have to play a different role than before. This isn't an efficiency problem. It's an identity problem. You don't solve that by giving someone a new tool.

Take a marketer with 15 years of experience. That person has built expertise in writing weekly reports, polishing campaign copy, and summarizing results. One day they watch AI produce a week's worth of reports in three hours. What's the feeling that comes up? "Cool"? No. For most people, a different feeling comes first.

"Then what am I?"

Backend engineers go through something similar. When an engineer watches AI plausibly replicate patterns they've spent years learning, they're not just looking at a productivity tool. They're re-examining the boundary of their own expertise. What moves a person at this point isn't a better demo. It's a structure that lets them sit with the anxiety and move into a new role.

That's why AX is harder at the transformation step than at the adoption step. Adoption is a matter of buying a license. Transformation requires rebuilding a person's identity.

This isn't a problem for one company. Demos are everywhere, but almost no company tells its people what to do on the Monday after. "Sixty percent of your job will be automated" is the part companies are willing to say. "Here's what you're now accountable for in the remaining forty percent" is the part almost no one gets to. The first sentence is a tech story. The second is a human story. AX built around tools is doomed to fail.

We aren't failing at AX because we don't know the tools. We're failing because we don't know the people and the org.

People Who Don't Understand an Organization Can't Change One

Steve Jobs, in his 1992 MIT talk, criticized consultants. His point was that people who don't own the outcome and don't own the execution produce only a tiny sliver of the value. It was a fundamental distrust of trying to change an organization from the outside.

I agree with that almost 100 percent. My situation was a little unusual, though. I was with them every day, functioning like I was already half-inside the organization. Most consulting doesn't work that way. So mine was closer to an exception.

This is the first time I'm telling this story publicly.

I started as a consultant. A company asked me to audit their engineering organization. Four evenings a week, three hours at a time, I'd go to the office. After leaving my own job, I'd clock in at another company. It was called consulting, but really it was three hours every evening reading the codebase, joining meetings, and talking to the people.

The two engineers I worked with in those first sessions were effectively forced into overtime because of my consulting. With those two, night after night, we read code, debated architecture, and talked through the organization's problems.

Over the course of a month, I got to understand the organization deeply. The state of the codebase, where technical debt sat, the bottlenecks between teams, people's skills and frustrations and ambitions. A month sounds short. But four nights a week, three hours a night, for a month is almost fifty hours.

Along the way, I found myself wanting more. The scope was past anything a consulting engagement could finish. From the outside, by advice alone, some things were never going to change.

It wasn't that I understood the backend architecture or the business deeply enough to make that call.

What mattered to me was who I'd be working with. The tech stack was made of technologies I'd never used. Node.js (I'd gone from Ruby and Python into a heavy Go phase, and had never handled Node.js in production) and Kafka, which was the consulting engagement's main project, were both new to me. On technical grounds alone, there was no reason to join.

But after a month of meeting the same people every evening (those two engineers, and the other members of their team), I felt convinced that together we could change something.

Tech, you can learn. Domain, you can dig into. But if you don't have the people to work with, nothing happens.

So I joined as CTO.

What I took away from that experience is one thing. To change an organization, you first have to understand it. And to understand it, you have to spend time. Throwing a framework over the wall from outside never works. Only after putting in three hours a night, touching the code, the people, and the culture directly, could I see what was broken and where to start.

Even that turned out to be nothing. Once I went full-time, problems that had been invisible during the consulting phase piled up like a mountain. From the org structure to server operations, for the first few months I was fighting several fires at once and almost never left on time.

AX is the same. Handing people an AI tool and saying "there, go do AX" doesn't land. You have to carve a small piece off a new business unit or an existing team and experiment. And that experiment isn't an experiment with the tool. It's an experiment with how work gets done. The person leading that experiment has to know the organization's people.

Giving advice is easy. Owning the outcome of that advice all the way through is a completely different thing.

That's why I keep circling back to people whenever I talk about AX.

Reading Jack Dorsey's piece "From Hierarchy to Intelligence," one line stuck with me. Hierarchy was originally not a structure for managing people but a structure for routing information. From Roman armies to railroad companies to modern enterprises, organizations have always solved the same problem. Who knows what, who gets that information next, and who makes the decision? The layers we've accepted as given (team lead, director, department head, PM, middle manager) exist because they paid the cost of moving information around.

Seen through that lens, the nature of what's happening now looks different. AI doesn't just do work faster. The human middle layer whose job was to collect, summarize, tidy, and pass along information is itself being shaken. Before, the organization only worked if a person reported up, coordinated sideways, and cascaded down. Now AI takes part of that. It reads documents, drafts, organizes context, summarizes data. At that moment hierarchy stops being strictly necessary. In some cases, it becomes the thing that slows you down.

Most companies stop here. They attach an AI copilot to the existing org. Documents get written faster, meeting notes get tidied faster, reports come out faster. They leave the structure as it is and run it slightly better. There's value in that. But Jack Dorsey's question goes deeper. Do you want to run the company more efficiently, or do you want to redesign how the company runs? The first is automation. The second is org redesign. The distinction I keep making in this post is exactly that.

Reframed this way, everything I've said so far ties together on one line. Setting up an AX task force is adding a layer at the exact moment you need fewer layers. The reason people feel anxious is that the information-passing and coordination role they used to hold is shifting. The reason I spent a month reading the organization before joining as CTO was the same. An organizational problem is less about individual skill and more about how information flows and who owns what. If you can't read that flow, you can't change the org.

So the next question follows naturally. Does an AI-era organization need more layers, or does it need fewer handoffs and clearer ownership? The answer is already in the room, from where I sit. As people shift from being couriers to being decision-makers, teams get smaller, and ownership has to carry all the way to the end. At that point, AX becomes a story about End-to-End.

AX Ends Up as End-to-End

In part one I broke the organization down into five axes, and I named the most important one as the End-to-End Ownership Team. A structure where a single team owns Discovery (what to build), Delivery (how to build), and Distribution (how to sell) from end to end.

For readers who skipped part one, one more sentence. This isn't an argument for a do-everything solo company. It's an argument for a structure where ownership doesn't break in the middle. A structure where you can always tell who defined the problem, who built it, and who saw it through.

What AI changed is simple. One person's coverage. AI helps with the product spec, the code, and the data analysis. Work that used to require three teams ping-ponging can now be handled by a much smaller team. The need for division of labor itself has dropped.

What Abnormal Security's CEO Evan Reiser calls "The Projection Problem" captures this well. Ideas are high-dimensional; language is low-dimensional. Every time the expert explains it to the PM, the PM writes the spec, and the engineer implements it, every handoff is lossy compression. Everyone looks at the same shadow and says "we're aligned," but they might be imagining different products. To fix that, Reiser put a CISO with 20 years of experience directly in the product owner seat and built a structure where AI interviews what's in his head. Expert to AI. One handoff. The cleanest explanation of why End-to-End matters.

So the direction of AX naturally leads to End-to-End. Smaller teams. Shorter handoffs. Clearer ownership.

This might sound like an empty slogan. But organizations are already moving this way.

Lumen Technologies took a more fundamental approach. A 30-year-old legacy telco. What CEO Kate Johnson (formerly at Microsoft) did wasn't deploy a tool. She redefined the company's identity. From "legacy telco" to "the backbone of the AI economy."

After piloting Microsoft 365 Copilot (a work AI assistant, different from GitHub Copilot) with the sales team, she rolled it out to the whole company, and used a "champion program" to assign AI evangelists in each department. The numbers are striking. Customer research that used to take a salesperson 4 hours dropped to 15 minutes. Across a 3,000-plus sales org, they saved an average of 4 hours per week, which over 12 months translated to $50M in revenue value.

JPMorgan Chase went deeper. The company treats AI not as a subtopic inside the tech department but as a core executive agenda. In practice, JPMorgan has seated CDAO (Chief Data & Analytics Officer) Teresa Heitsenrether on the Operating Committee. AI isn't cordoned off into a separate org. It's at the core decision-making table.

Walmart is moving in a similar direction. The company is consolidating fragmented AI capabilities into four super agents for customers, employees, partners, and developers. Then–Walmart U.S. CEO John Furner said at Fortune Brainstorm Tech, "Headcount in two to five years will be roughly the same as today." The meaning is simple. AI will raise productivity, but they won't turn that directly into headcount reduction. They're going to run a bigger business with the same-size org and change what people actually do. That's closer to a declaration than a prediction.

Lumen (telecom), JPMorgan (finance), and Walmart (retail). What they have in common is that they did not build a separate AX team. They're directly changing the roles and structure of the existing organization.

The Maker vs. Closer distinction I drew in part one lives in the same context. The point wasn't to add more people who only produce output. It was to add more people who carry work through to completion. MyRealTrip CEO Donggun Lee's pivot ("grow the number of people who can sell directly" over "make it better") lives on the same line. It's a declaration that the wall between engineering and marketing, between product and sales, has to come down.

Before, getting past that wall meant filing a request with another team. You had to wait for assets, explain yourself, and justify the ask. That's different now. An engineer can draft ad copy with AI, classify customer feedback, and even do basic data analysis on their own.

One more step in: this isn't an argument for making engineers do sales. It's an argument for letting engineers hear the customer's voice directly, fold that feedback into the next sprint, and see the business impact of what they shipped in actual numbers. Same for marketers. Since AI can draft a product brief, marketers get more time for judgment calls about product direction. The energy that used to go into "when is the engineering team going to get to this?" can now go into "what does the customer actually need?"

This isn't saying everyone should do everything. It's saying that the wall you used to book a conference room to get past is lower now.

At that point the Maker becomes a Closer. The distinction I drew in part one is turning from an abstract concept into a real state. Someone who only produced output becomes someone who carries a result to the end. That's what the individual-level transformation inside an AX organization actually looks like.

Where AX works, you find small teams organized around problems, not around job titles. Teams that own End-to-End. Teams with minimal handoffs. Teams where one group handles it from Discovery to Distribution. The goal isn't a separate "AX task force." It's a structure where each existing team becomes AX on its own.

Don't push the tool. Let the need pull the tool in.

When Layers Disappear, Roles Shift

Same organization, but this time we look at how individual roles change rather than at structure.

Lots of people see AX as IT's job. The engineering team uses AI, designers write code, PMs run agents. Those changes matter.

But real AX happens outside of IT, in the field.

A recent example from the OpenAI Codex team shows this clearly. On that team, designers write code themselves, PMs work as builder-owners instead of messengers, and engineers focus on supervising the system rather than writing code line by line. You could say "well, that's an AI company's engineering team." But the principle is general.

"Designers become builders. PMs become owners. Engineers become supervisors." That principle doesn't only apply to an AI company's engineering team. It's a universal pattern for an era in which role boundaries are being redefined, across every industry.

Look at retail. A store operations lead at Walmart isn't just someone who files requests to headquarters anymore. They're handling inventory, promotions, and customer signals directly in a much shorter loop. What the AI system created isn't a fancy dashboard. It expanded the radius of judgment on the ground. Target too. The Enterprise Acceleration Office exists to raise enterprise-wide speed and agility. The goal is to shorten the distance between decision and execution in the stores.

Finance is even more dramatic. Bank of America's Erica has handled over 3.2 billion cumulative interactions, and more than 98% of users find what they need. The real point isn't the number. What Bank of America actually did was build Erica for consumers first, then reuse the same engine for employees, wealth management, and corporate banking. "Build Once, Reuse." It's a strategy of spreading one engine across the whole organization.

What did that change on the ground? Advisory, underwriting, and operations stopped being three teams passing documents around. They got closer to the customer's problem and saw it through. With the AI handling repetitive queries, advisors got room to go deeper into complex financial problems. Underwriters gained the same kind of space. Time that went into document verification now goes into actual risk judgment.

The Bank of America CIO's line is worth sitting with. "The discipline of slowing down early produces much faster acceleration over the long run." Organizations racing to deploy tools should probably hear that one first.

Telecom tells the same story. At Lumen Technologies, salespeople cut four-hour research down to fifteen minutes and spent the rest of the time in deeper customer conversations. Before, gathering pre-meeting material ate up half a day. Now, fifteen minutes is enough to pull together industry trends, prior contracts, and likely needs for the customer. The salesperson's role shifted from "material collector" to "customer problem-solver."

Manufacturing is no exception. The UAE's Emirates Global Aluminium (EGA) built a cross-functional execution model under its CDO (Chief Digital Officer) and retrained more than 3,000 employees. In 2025 it was named a WEF (World Economic Forum) Industry 4.0 Global Lighthouse. A first for the aluminum industry.

A common pattern shows up here.

In retail, in finance, and in telecom, jobs don't disappear. The center of gravity of the job moves. From reporting to judgment. From passing things along to executing. From collecting to interpreting.

The point isn't to teach everyone to code.

The point is letting each person keep their expertise while gaining a wider radius of judgment and execution. Someone who only reported now also executes one step further. Someone who only executed now also judges one step further. Someone who only judged now also carries the result through to the end.

This isn't proprietary to tech companies. It's an operating model that every existing organization has to translate into its own language.

A good AX team's role converges here, too. Not promoting tools, but making these role transitions actually happen.

What Bank of America shows is that the transition doesn't happen overnight. After launching Erica in 2018, it took seven years to extend it to employees and wealth management. But once the plumbing is in, it gets reused again and again. Better to lay it slowly and use it for a long time than lay it fast and watch it collapse.

More Tools Is Not the Same as Winning

A lot of organizations trip over this next part.

Once a company declares it's doing AX, people start showing up with their own little tools. Non-engineers build small automations with Claude Code. They wire up agents to summarize reports, classify customer inquiries, and tidy meeting notes. Some even ship apps. At first it's heartwarming. The feeling is, "Our organization is finally changing."

That scene is genuinely moving. A non-engineer attaches an agent to their own work and makes something, however small. From the org's point of view, it's a good signal. At least the psychological resistance isn't completely closed off.

But is that really AX?

Has AX actually taken hold in the organization?

Does personal work automation genuinely contribute to the organization's performance?

Usually, no.

Because in most cases, the tool reduces friction at the individual level without touching the organization's bottleneck.

A marketer can build a tool that generates their weekly report faster. A CS rep can hook up a bot that drafts replies faster. A finance person can write a script that makes reconciliation spreadsheets less painful. All fine. All productive.

But that's not how a company wins.

A company wins not when individuals finish their own work a bit faster, but when the whole organization moves faster in the same direction. If personal automation doesn't connect to organizational wins, it's just another tool on the pile.

I'd call this local optimization at the individual level. The friction on your desk is lower. Your calendar has a little more room. But cross-team approvals are still stuck, priority conflicts are still unresolved, and the lack of goal alignment is untouched. The organization still slows down at the same places.

There's a worse version. Once everyone starts making their own tools, the org gets more fragmented, not less. Even within one marketing team, A uses her own GPT, B uses a Claude project, C wires together Zapier and Notion. Engineering has its own agent workflows. On the surface, everyone looks innovative. But there's no shared language, shared goal, or shared operating principle.

The tools multiply. The organization fits together worse.

That's not AX. That's "every person for themselves," but more sophisticated.

At this point I find myself pulled back to a previous post, An Organization That Can't Win Has No Future. The core of that piece was simple. If an organization can't define what winning means, the way it works loses meaning. Effort without direction can be comforting, but it won't become results.

Same for AX.

Changing how you work without aligning the organization's goals is just adding one more tool to the stack.

The problem starts here. AX discourse too easily mistakes "tool utilization rate" for performance. How many people use agents internally, how many automations have been built, how many demo days have been held. Those are activity metrics. They are not outcome metrics.

The real questions are these.

Did time-to-customer go down?
Are we making better decisions faster?
Are we closer to the organization's winning conditions?

If the answer is unclear, the organization's AX is still at the toy stage.

Here's the distinction.

Activity metric (looks good)	Outcome metric (actually matters)
Internal AI user count	Customer lead-time reduction
Number of automations built	Cross-team handoffs eliminated
Demo days held	Decision latency
Prompt-share count	Business impact on the org

The left column is great for internal slide decks. Measuring real AX means looking at the right column.

I'd argue the leader has to get colder, not warmer, the moment employees start building their own tools. Don't just applaud. Check where those tools are reducing friction, whether they expose a common bottleneck, and whether they contribute to the organization's core value flow. An organization that cleanly removes one shared bottleneck is much closer to AX than an organization with twenty scattered personal tools.

Personal automation can be a signal. It's not evidence.

Signals are welcome. Evidence is different.

The evidence of AX is always the same. Shorter processing time, fewer handoffs, clearer ownership, and most of all, a contribution to the whole organization's business performance.

Miss that, and the org collapses the same way it always did, even in the AI era. Just with more tools.

So You've Already Set Up an AX Team. Now What.

Reality is reality. Plenty of organizations have already stood up an AX team. Leadership made the call, a team got assembled, and a mission was handed down. How, then, do you at least raise the odds of it working?

Think back to a previous organization. When introducing a new process, the most effective move wasn't importing an outside expert. It was pulling in the person on each team who had the most complaints and who also knew the work best.

Those aces knew the real bottlenecks. People with the most complaints know best what's wrong. People who know the work know what actually has to change for things to move.

An AX team has to start from there.

What an AX team needs isn't an AI expert. You need the marketer who knows where marketing's bottleneck is, the engineer who knows the real problems in the dev pipeline, the CS lead who knows where things break at the customer touchpoint. Those people come together and redesign how work is done in their own context, using AI as a tool, not the other way around.

The reason this matters is that an AX team that doesn't know the field almost always turns into an "AI PR team." They run internal seminars, introduce tools, and translate success stories. It looks legit. But the actual bottlenecks on the ground stay right where they were.

Order is everything. Most organizations run this order backwards. First they install the tool, then they think about where to use it. Nothing ends up planted deeply anywhere.

The mission also gets set wrong. "Drive AI adoption" isn't a mission. It's a slogan. The mission has to be a concrete business problem. Cut new customer onboarding time by 50%. Double customer inquiry resolution speed. Bring monthly report generation from 8 hours down to 2. That kind of thing. Then the team looks at processes, not tools.

It was the same in a previous org. When we moved from an abstract mission like "improve software quality" to a concrete one ("count issues and reduce them every week"), the team finally moved. We counted issues, tidied them, prioritized them, and focused on bringing the number down. This wasn't only a developer's job. PMs, QA, and designers all lined up on the same issue pipeline, which is why it worked. AI didn't solve it. Looking at the same number and moving in the same direction did.

The more concrete the mission, the more the team looks at boring bottlenecks over flashy demos. Most of the time, real impact comes out of those boring bottlenecks.

There's another trap here. Many organizations frame the mission as "reach 80% AI tool adoption." The number is concrete. The direction is wrong. Utilization rate is an activity metric. It says nothing about which bottleneck gets fixed or which value flow it serves.

A proper mission looks like this. "Cut first-response time on customer inquiries from 24 hours to 4. AI is an assist; final accountability sits with the dedicated team." This mission doesn't force tool usage. It nails down the result and the ownership structure. The tool is just a means.

People don't move on mission alone. They need to see the reason their role is changing and the path forward.

ServiceNow assessed all 28,000 of its employees. What matters isn't the assessment itself. They had to redraw who could move into which role after AI, who needed what kind of training, and who was currently stuck in repetitive work. People aren't afraid of AI itself. They're afraid of being useless in an AI era. That's why redrawing the role map comes first. You have to know exactly what someone can do now before you can design what they'll learn next.

The AX team's job is not tool evangelism. It's laying the runway that lets people learn, switch roles, and run experiments in the field.

Execution authority doesn't have to be grand. You don't need sweeping org redesign power. You just need the authority to run tiny experiments. Skip one approval step. Try a different way to classify customer inquiries for two weeks. Put one specific report through an AI assist. If it doesn't work, roll back. Without that much authority, AX lives and dies as a deck. With even that much, the organization actually learns something.

No experimentation authority, no innovation. That was it.

AX teams are the same. "Do it this way" has to become "we'll try it ourselves first." And then you have to show the result. In numbers. Successful experiments get handed off to the field. Failed experiments get logged so the next team makes fewer mistakes. What matters is that the result doesn't stay inside the AX team. It becomes organizational learning.

In the end, an AX team's success depends on its goal being its own disappearance. Once each field organization can run its own AX, a separate AX team is redundant.

A team whose reason for existing is to stop existing.

It's paradoxical, but it's the only design that works.

Walmart redesigned the org. Bank of America laid the plumbing. Lumen redefined identity. Commonwealth Bank reversed course.

The results are already in.

Closing

The first principle of AX isn't tools. It's the organization and its people.

Understanding AI matters. Knowing Claude Code matters. But before that, you have to understand the people in the organization that's going to use the tool. Why they work the way they do. What drains them. What has to change for them to actually move.

At the end of understanding people lies direction. If the organization can't define a direction, no matter how great the tool, people run toward different places. As I wrote before, effort without direction can be comforting, but it won't become results.

The more an organization wants to be good at AX, the more sharply it has to ask itself: what are we trying to win here.

I invested three hours every evening to understand that organization. That was all I could do. Not a framework, not a toolkit. Just sitting in the office at night reading code and talking to people. Nothing glamorous.

But that was the real thing. People who know tools stay with tools. People who know people change organizations.

"If you want to build a ship, don't drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea."

Antoine de Saint-Exupéry

References

MIT NANDA, "95% of GenAI Pilots Failing: What the Successful 5% Do Differently"
Fortune, "Why CFOs, not CAIOs, Are the Secret to AI Value"
CNBC, "Coca-Cola, Walmart CEOs Step Down amid AI Shift"
Ads of the World, "Coca-Cola: Designers Lead. AI follows."
The Verge, "Coca-Cola's new AI holiday ad is a sloppy eyesore"
CBS Atlanta, "Coca-Cola plans to cut about 75 jobs at its Atlanta headquarters in early 2026"
The Register, "Commonwealth Bank AI Chatbot Fail: 45 Rehired"
Walmart Tech, "All in on Agents"
Fortune, "Why Walmart's CEO says AI won't lead to lower headcount"
Microsoft, "How Copilot Is Helping Propel an Evolution at Lumen Technologies"
Fortune, "Inside Bank of America's Build Once AI Strategy"
Bloomberg, "Intel CAIO Leaves for OpenAI after 7 Months"
Defense Scoop, "Feinberg orders major shakeup in Pentagon's AI enterprise"
CloudWars, "ServiceNow: 28,000 Employees Assessed for AI Skills"
SalesforceBen, "Salesforce Lays Off Nearly 1,000 Employees in Early 2026"
CNBC, "JPMorgan CEO Jamie Dimon: AI Is Reshaping Workforce"
JPMorgan Chase, "Teresa Heitsenrether"
EGA, "EGA Designated Industry 4.0 Global Lighthouse by WEF"
Evan Reiser, "The Projection Problem"
flowkater.io, "Installing Claude Code at Your Company Doesn't Get You AX"
Target, "Enterprise Acceleration Office"
Block, "From Hierarchy to Intelligence"
YouTube / Peter Yang, "How OpenAI's Codex Team Builds with Codex"
flowkater.io, "An Organization That Can't Win Has No Future"

When Package Install Becomes a Hack: Why Zero Dependency Matters

Tony Cho (https://flowkater.io) — Sat, 04 Apr 2026 02:00:00 GMT

Opening

Around January 2022, the lead maintainer of colors.js and faker.js (Marak Squires) went public with a complaint: large companies were using his open source for free without paying him a cent. As a form of protest, he intentionally broke his own libraries. Both packages were used in thousands of projects, including a number of large companies, and colors.js was downloading at over 20 million per week. A lot of builds and services went down because of it.

There's a representative package manager for nearly every language. Ruby has gem, Python has pip, JavaScript has npm, Rust has cargo. I've shipped services fast on top of these open-source ecosystems, pulled all-nighters chasing dependency conflicts, and at times even monkey-patched open-source libraries for my own use. Most of the time, I was grateful. These libraries let me build quickly, and I was happily inheriting the work of all the Stack Overflow veterans who had suffered before me. After importing one open-source library after another, at some point I started wondering whether coding was just LEGO(?). (Even before the AI era, there was always a recurring jab: "If you're just gluing open source together, is that really development?")

In that context, the colors.js incident hit me hard. The line "If you're just gluing open source together, is that really development?" stopped being a cheap jab and started feeling like an existential question about my own output. What am I actually building? Am I a coder, or an engineer? The old software adage "don't reinvent the wheel" started to feel less obvious. Mostly true. But really? (My drift toward Go over Python or Node.js, and Swift over Flutter on mobile clients, isn't unrelated to this. Going fully without package dependencies is hard, but heading in the direction of fewer is something I started to value.)

As it happened, two more incidents hit back-to-back. One was the litellm (PyPI) maintainer-account takeover on March 24, 2026. The other was the axios (npm) OS-specific malicious RAT distribution on March 31.

Both landed close to home for me. litellm shows up in many open-source agent plugins (for Claude Code, Codex, and similar tools), and axios was the library I always reached for when working in React.

Since the LLM/AI era took off, a lot of Korean developers have been releasing open source one after another, and some of those libraries are genuinely good. I haven't built anything that qualifies yet, but the litellm and axios incidents pushed me to write down what I think about open-source design, and why zero dependency matters.

"Don't reinvent the wheel" is fine advice. The question is, how far can I trust the wheel?

March 2026: Two Incidents in litellm and axios

litellm: The day an LLM integration library became a credential-stealing tool

litellm is a Python library that unifies more than 100 LLM providers (OpenAI, Anthropic, Bedrock, Vertex AI, and more) behind a single interface. If you run an LLM-based service, chances are you've typed pip install litellm at some point. PyPI shows millions of monthly downloads. It's one of the core packages in the AI infrastructure stack.

On March 24, 2026, litellm's PyPI maintainer account was hijacked. The attacker pushed a poisoned version through the normal release process. From the outside, it looked like a routine update. Anyone who ran pip install --upgrade litellm got the malicious package with no warning.

The technical structure was clever. The malicious code abused Python's .pth file mechanism. A .pth file is a path-configuration file that Python runs automatically at startup, and the attacker injected code into one so that the payload would fire the moment Python started, even without import litellm. Install the package, write zero lines of import code, and the next time Python runs, you're infected.

According to Sonatype's analysis, the payload ran in stages. Stage one collected system environment information. Stage two scraped API keys, cloud credentials, and database connection strings from environment variables and shipped them to an external server. If you run an LLM service, your environment variables are usually a treasure chest: OpenAI API keys, AWS credentials, database passwords. The attacker knew exactly where to look.

Kaspersky's report pointed out that this wasn't litellm-only. It was a coordinated supply-chain campaign that also targeted Trivy, Checkmarx, and other security-tool packages. Security tools as the security hole. You can't make this up.

The most striking part is the transitive dependency path. Organizations that installed litellm directly weren't the only victims. Anyone who installed something that depended on litellm (certain Cursor MCP plugins, some agent frameworks) got litellm pulled in alongside, and the infection path opened. According to HeroDevs' analysis, a large number of vulnerable projects didn't even use litellm directly.

I had a close call myself. I'd recently installed a Claude Code agent skill called ouroboros, and litellm was somewhere in its dependency tree. Luckily I never actually ran the skill, so litellm never activated in my Python environment. But if I'd run it even once, the API keys in my environment variables could have leaked. It made my stomach drop.

axios: The day an HTTP client became a RAT delivery tool

axios is the most widely used HTTP client library in the JavaScript and TypeScript ecosystem. Over 100 million weekly downloads on npm. If you're working in React, Vue, or Node.js and you need to call an API, your fingers practically type npm install axios on their own. I reached for it every single React project.

On March 31, 2026, the axios npm package was compromised. According to Elastic Security Labs' detailed writeup, the attacker used the postinstall script in package.json. That hook runs automatically the moment npm installs a package, and the attacker dropped a malicious script into it. Run npm install, and before you've imported a single line of code, the attacker's code is running.

The cleverness was in shipping different payloads per OS. On Windows, it pulled an executable via PowerShell. On macOS, it combined curl and bash to fetch a binary. Linux had its own payload. Huntress' report confirmed that the payload was a RAT (Remote Access Trojan). A backdoor that lets an attacker connect to your machine remotely, read files, capture keystrokes, install more malware.

Sophos' analysis listed the RAT's capabilities: system information collection, filesystem traversal, keylogging, screen capture, downloading and executing additional payloads. Effectively, full remote control of an infected developer machine.

This incident made it painfully clear why npm's postinstall hook is dangerous. npm runs preinstall, install, and postinstall scripts in order during package installation, and most developers have no idea this is happening. Type npm install axios and all you see is the install progress bar. Almost nobody checks what scripts ran behind it. Even the official npm docs recommend minimizing postinstall use, but countless packages still rely on it.

Look at the two incidents side by side and the pattern emerges. Both compromised a maintainer account or release pipeline. Both infected you just by installing the package, with zero lines of code executed. And both targeted core packages that sit deep inside millions of projects' dependency trees.

litellm hit the spine of AI infrastructure. axios hit the spine of web development. One week apart.

Ten Years of Pattern: From Disruption, to Damage, to Infiltration

Open-source dependency incidents didn't suddenly start in 2026. There's a ten-year arc, and it's been moving in one direction.

2016, left-pad: the disruption era

In March 2016, a developer named Azer Koçulu deleted his left-pad package from npm. It was an 11-line function. Pad a string on the left with spaces or a specific character. That was it. When those 11 lines vanished, builds broke across thousands of projects, including React and Babel.

David Haney asked at the time on his blog, "Have we forgotten how to program?" The fact that we'd reach for a package instead of writing 11 lines of string padding ourselves was funny and a little sad at the same time.

The essence of left-pad was disruption. When the dependency disappears, the build breaks. That's all there was to it. No malice, no intent to harm. It started with "I'm deleting my own package, what's the problem?" and the damage stopped at failed builds and delayed deploys. npm tightened its unpublish policy afterward, people thought about dependencies for a minute, and then forgot.

2022, colors.js: the damage era

Six years later, Marak Squires injected an infinite loop into his own colors.js and faker.js. This time it wasn't a mistake. It was deliberate sabotage. The motive was anger that big companies were making money off his open source while none of it came back to him. Every application that imported colors.js stalled while spamming "LIBERTY LIBERTY LIBERTY" to the console.

The shift from disruption to damage had begun. left-pad was a failure of absence. colors.js was code that was actively present and doing harm. The problem wasn't the missing package. The problem was the package that was there. The vector flipped.

Even so, colors.js still had a "human face." The attacker was the original author, not an anonymous hacker. The motive (whether you agreed with it or not) was a comprehensible grievance. The damage was service disruption, not data exfiltration or credential theft.

2026, litellm and axios: the infiltration era

The two incidents in March 2026 crossed into different territory. Not a maintainer's personal protest, but the planned infiltration of an organized attacker. Auto-execution via .pth, the postinstall hook, OS-specific RATs. The technical sophistication is on another level. So is the goal. Not stopping the service, but stealing credentials, planting backdoors, and securing persistent access.

Here's the ten-year arc in one table.

Year	Incident	Type	Motive	Damage scope
2016	left-pad	Disruption	Personal decision	Build failures
2022	colors.js	Damage	Protest	Service outages
2026	litellm/axios	Infiltration	Organized attack	Credential theft, remote control

Russ Cox warned about this in 2019 in Our Software Dependency Problem: "Software dependencies are part of the software supply chain, and supply-chain security is determined by the weakest link." It sounded like a theoretical warning back then. Seven years on, the warning is reality.

I used to fear a dependency breaking on me. Now I fear a dependency breaking me.

My Tools: Superpowers and Zero Dependency

4-1. Why I picked Superpowers

I introduced this in a previous post, but Superpowers is a skill framework that gives AI coding agents like Claude Code or Codex a systematic development workflow. It enforces the flow of question → design → plan → TDD → code review automatically, without any commands. I adopted it because it bundles together the interview command, TDD skill, and code-review skill I'd been building separately into a single structure.

Honestly, "zero dependency" wasn't on the list of reasons I picked Superpowers. The workflow matched how I work, the skill structure was intuitive, and the brainstorm → plan → execute flow lined up with the development process I aim for. After the litellm/axios incidents I went back into the Superpowers code, and there it was: a concrete answer to the question "how should you design good open source?"

4-2. Proof in code: 354 lines of server.cjs

Superpowers ships with a local web server used during brainstorming sessions. It serves HTML to a browser, sends real-time updates over WebSocket, and watches files for changes. Anyone who's done web work knows the usual stack for this kind of feature: Express (HTTP server), ws (WebSocket), chokidar (file watcher). And installing those three drags hundreds of sub-packages into node_modules.

That's exactly how this server started out. The v5.0.2 release notes record what happened next.

Zero-Dependency Brainstorm Server Removed all vendored node_modules — server.js is now fully self-contained. Replaced Express/Chokidar/WebSocket dependencies with zero-dependency Node.js server using built-in http, fs, and crypto modules. Removed ~1,200 lines of vendored node_modules/, package.json, and package-lock.json.

Express, chokidar, and ws plus their transitive dependencies amounted to about 1,200 lines of vendored code, all stripped out. The whole server was rewritten using only Node.js built-ins. The result is a single file, server.cjs, 354 lines long.

The Superpowers package.json makes the choice obvious.

{
  "name": "superpowers",
  "version": "5.0.7",
  "type": "module",
  "main": ".opencode/plugins/superpowers.js"
}

No dependencies field. No devDependencies. As an npm package, the external dependency count is literally zero. Run npm install on this project and not a single external package is added to node_modules.

Now let's look at how those 354 lines actually break down.

1) WebSocket protocol: implementing RFC 6455 directly

Instead of using ws, this code implements the WebSocket protocol (RFC 6455) by hand. The core of WebSocket is frame encoding and decoding. Wrapping the data the server sends into binary frames, and unmasking the masked frames the client sends.

const OPCODES = { TEXT: 0x01, CLOSE: 0x08, PING: 0x09, PONG: 0x0a };
const WS_MAGIC = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";

function computeAcceptKey(clientKey) {
  return crypto
    .createHash("sha1")
    .update(clientKey + WS_MAGIC)
    .digest("base64");
}

This is the WebSocket handshake. Take the Sec-WebSocket-Key the client sends, append the magic string, run SHA-1, and produce Sec-WebSocket-Accept. Exactly what RFC 6455 Section 4.2.2 specifies. That's what the ws library does internally. Written by hand, it's five lines.

Frame encoding is the same story.

function encodeFrame(opcode, payload) {
  const fin = 0x80;
  const len = payload.length;
  let header;

  if (len < 126) {
    header = Buffer.alloc(2);
    header[0] = fin | opcode;
    header[1] = len;
  } else if (len < 65536) {
    header = Buffer.alloc(4);
    header[0] = fin | opcode;
    header[1] = 126;
    header.writeUInt16BE(len, 2);
  } else {
    header = Buffer.alloc(10);
    header[0] = fin | opcode;
    header[1] = 127;
    header.writeBigUInt64BE(BigInt(len), 2);
  }

  return Buffer.concat([header, payload]);
}

The logic picks a 2-byte, 4-byte, or 10-byte header depending on payload length. The whole core of the WebSocket frame format fits in this one branch. Under 126, the length is packed into 7 bits. Between 126 and 65536, a 16-bit extension. Above that, a 64-bit extension.

Decoding is the reverse. Client frames must be masked (RFC 6455 requires it), so a 4-byte mask key is XOR'd back through the payload.

function decodeFrame(buffer) {
  if (buffer.length < 2) return null;
  const secondByte = buffer[1];
  const opcode = buffer[0] & 0x0f;
  const masked = (secondByte & 0x80) !== 0;
  let payloadLen = secondByte & 0x7f;
  let offset = 2;

  if (!masked) throw new Error("Client frames must be masked");
  // ... length parsing, mask key extraction ...

  const mask = buffer.slice(maskOffset, dataOffset);
  const data = Buffer.alloc(payloadLen);
  for (let i = 0; i < payloadLen; i++) {
    data[i] = buffer[dataOffset + i] ^ mask[i % 4];
  }

  return { opcode, payload: data, bytesConsumed: totalLen };
}

The ws library also gives you permessage-deflate compression, fragmentation, subprotocol negotiation, and more. Does the Superpowers brainstorm server need any of that? Does sending an HTML reload message over localhost need compression? It does not. So the implementation only covers what's actually needed. TEXT, CLOSE, PING, PONG. Four opcodes is enough.

2) HTTP server: no Express

Express is gone, replaced by Node.js's built-in http module. There are exactly three routed paths, so a middleware stack adds nothing.

function handleRequest(req, res) {
  touchActivity();
  if (req.method === "GET" && req.url === "/") {
    const screenFile = getNewestScreen();
    let html = screenFile
      ? (raw => (isFullDocument(raw) ? raw : wrapInFrame(raw)))(
          fs.readFileSync(screenFile, "utf-8")
        )
      : WAITING_PAGE;

    if (html.includes("</body>")) {
      html = html.replace("</body>", helperInjection + "\n</body>");
    } else {
      html += helperInjection;
    }

    res.writeHead(200, { "Content-Type": "text/html; charset=utf-8" });
    res.end(html);
  } else if (req.method === "GET" && req.url.startsWith("/files/")) {
    // static file serving
    const fileName = req.url.slice(7);
    const filePath = path.join(CONTENT_DIR, path.basename(fileName));
    if (!fs.existsSync(filePath)) {
      res.writeHead(404);
      res.end("Not found");
      return;
    }
    const ext = path.extname(filePath).toLowerCase();
    const contentType = MIME_TYPES[ext] || "application/octet-stream";
    res.writeHead(200, { "Content-Type": contentType });
    res.end(fs.readFileSync(filePath));
  } else {
    res.writeHead(404);
    res.end("Not found");
  }
}

Serve the most recent HTML on /. Serve static resources on /files/*. Everything else, 404. That's all there is. What app.get(), app.use(), and app.static() do under the hood is exactly this. Does a server with three routes really need a framework's middleware chain?

3) File watching: no chokidar

Node.js's fs.watch() is famous for behaving differently per platform. On macOS, the rename event fires for both new file creation and overwriting an existing file. That's why most projects reach for chokidar, which gives you cross-platform compatibility, recursive watching, stable event debouncing, and so on.

But the Superpowers server has exactly one watch target. Changes to .html files in the CONTENT_DIR folder. One directory, one extension. In that case, attaching your own debounce timer to fs.watch() is plenty.

const debounceTimers = new Map();

const watcher = fs.watch(CONTENT_DIR, (eventType, filename) => {
  if (!filename || !filename.endsWith(".html")) return;

  if (debounceTimers.has(filename)) clearTimeout(debounceTimers.get(filename));

  debounceTimers.set(
    filename,
    setTimeout(() => {
      debounceTimers.delete(filename);
      const filePath = path.join(CONTENT_DIR, filename);
      if (!fs.existsSync(filePath)) return;
      touchActivity();

      if (!knownFiles.has(filename)) {
        knownFiles.add(filename);
        // reset event file when a new screen is added
        const eventsFile = path.join(STATE_DIR, "events");
        if (fs.existsSync(eventsFile)) fs.unlinkSync(eventsFile);
      }

      broadcast({ type: "reload" });
    }, 100)
  );
});

A Map tracks per-file debounce timers, and a Set tracks known files. The macOS rename double-fire problem dissolves naturally with a 100ms debounce. Recursive watching, glob patterns, symlink tracking. chokidar gives you those, and none of them are needed here.

4) WebSocket upgrade handshake

The HTTP-to-WebSocket protocol upgrade is also handled by hand.

function handleUpgrade(req, socket) {
  const key = req.headers["sec-websocket-key"];
  if (!key) {
    socket.destroy();
    return;
  }

  const accept = computeAcceptKey(key);
  socket.write(
    "HTTP/1.1 101 Switching Protocols\r\n" +
      "Upgrade: websocket\r\n" +
      "Connection: Upgrade\r\n" +
      "Sec-WebSocket-Accept: " +
      accept +
      "\r\n\r\n"
  );
  // ... frame-based communication after this
}

A four-line handshake that writes an HTTP 101 directly to the socket. That's what ws's WebSocket.Server does inside.

4-3. Cost and value: 1,200 lines down to 354

The previous version's vendored node_modules was about 1,200 lines. Express's middleware chain, chokidar's cross-platform abstraction, ws's full WebSocket spec implementation, all included. After the refactor, 354 lines. The 846 lines that disappeared weren't useless code. They were "code for features this project doesn't need."

Rob Pike put it this way in Go Proverbs:

"A little copying is better than a little dependency."

The first time I read that, my reaction was "yeah, in theory, but realistically..." After looking at server.cjs, I can put a number on what "a little copying" actually means. WebSocket handshake, 5 lines. Frame encoding, 20 lines. Frame decoding, 30 lines. HTTP routing, 25 lines. File watching, 15 lines. Total, about 100 lines of "stuff the library used to do." The other 254 lines are business logic (screen management, event handling, lifecycle management), code you'd write yourself regardless of which library you picked.

100 lines of "copying" replaced 1,200 lines of dependency. And those 100 lines are readable. If you've read RFC 6455, you can verify this WebSocket implementation yourself. That's a different level of transparency from chasing through Express's internals to confirm the middleware chain is correct.

The value of zero dependency isn't only security. It's readability. 354 lines is something one person can sit down and read end to end. You can understand every behavior the code has. You know exactly what code runs in your project. Throw in 1,200 lines of vendored dependencies and you start leaning on the assumption that "most of this is fine, no need to look." That assumption shatters when litellm or axios happens.

What Good Open-Source Design Looks Like: Code-Level Patterns

Superpowers' server.cjs is one example. There are more zero-dependency projects than you might think, and they share a common set of code-level patterns. I see three layers.

Layer 1: Keep the core value outside the code

The core value of good open source isn't the code itself. It's the mental model the code implements. Superpowers' core value isn't the 354 lines of server.cjs. It's the workflow mental model: question → design → plan → TDD → review. That model lives in markdown skill files. The code is just a helper that supports it.

In a structure like that, dependencies can't help shrinking. If the core value isn't in code execution, the executable code only needs to do the minimum supporting work. Not putting complex dependency-heavy features into the "core value" is the first design principle.

Layer 2: Keep the executable code small

When code does need to run, narrow the scope of what it does. server.cjs covers exactly one scope: a local brainstorming server. It's not trying to be a general-purpose web framework. It only implements what this single use case needs. Narrow scope means fewer features. Fewer features means a higher chance you can build it without external libraries.

Layer 3: Separate install from execution

The scariest part of the litellm incident was the auto-execution via .pth. Code runs at install time. axios's postinstall hook is the same shape. Good design draws a clear line between "install" and "execute." Installing a package should mean copying files to disk, nothing more. No code should run automatically. Code should run only when the user explicitly imports it or runs a command.

Why zero dependency is possible

Let's see how those three layers play out in real projects, and what technical strategies make zero dependency possible.

zod: TypeScript schema validation library. Over 30 million weekly downloads on npm. Zero dependencies. zod can run dependency-free because the TypeScript type system itself is enough to express validation logic. Runtime checks are built from plain JavaScript conditionals and type guards. No external parser engine, no code generator. Schema definition and runtime checking both ride on TypeScript's type inference. The library's job is "take a JavaScript value, run conditionals on it, narrow the type if it matches," and there's no part of that which needs outside help.

nanoid: unique ID generator. Zero dependencies. One file, 130 bytes gzipped. nanoid's core is a single web standard API: crypto.getRandomValues(). It's built into both browsers and Node.js, gives you cryptographically secure randomness, and nanoid just encodes the output as a URL-safe string. UUID libraries often need dependencies because they have to polyfill older environments. nanoid sidesteps that by supporting only modern runtimes.

picocolors: terminal coloring library. The zero-dependency alternative to chalk. One file, 0.1KB bundle size. Terminal coloring is fundamentally about wrapping strings with ANSI escape sequences (\x1b[31m for red, \x1b[0m for reset). chalk had over 10 dependencies for color-space conversion, 256-color support, Windows console compatibility, and other extras, but most CLI tools really only use red, green, yellow, and bold. picocolors decided to support only that "most," and ended up with zero dependencies.

Hono: web framework. Multi-runtime support across Node.js, Deno, Bun, Cloudflare Workers. Zero dependencies. Hono can stay 0-dep because it uses only the Web Standard API (Request, Response, URL, Headers). Express built its own abstraction layer over Node.js's http module. Hono runs on the web standard interface that all runtimes already implement. The standard, not the library, papers over runtime differences.

chi: Go HTTP router. Zero dependencies. Go's standard library net/http already provides a strong HTTP server, and chi just adds one routing tree (a radix tree) on top. Zero dependency is unusually common in the Go ecosystem partly because the standard library covers so much already, and partly because Rob Pike's "a little copying is better than a little dependency" runs through the community.

sqlc: a tool that compiles SQL queries into Go code. Zero runtime dependencies. ORMs generating queries and mapping results at runtime usually need reflection, code generation, and connection pooling, all of it complicated runtime machinery. sqlc flips the approach. It parses SQL at build time and generates type-safe Go code, so at runtime the standard database/sql package is plenty. The complexity moved from runtime to build time.

The shared traits across these projects:

Narrow scope. They do one thing well. They don't try to cover every case.
Lean on platform built-ins. Don't reimplement what the runtime already provides.
Don't confuse extras with the core. The courage to leave out "nice to have" features.

Vendoring and runtime evolution: structural change driven by tech

Google uses a vendoring strategy in its large monorepos. They copy external dependency source code directly into their own repository and patch it independently afterward. The point is securing "control" over the dependency. Even if a package is tampered with on npm or PyPI, the vendored copy is unaffected.

Bun and Deno are worth watching too. Bun bundles the bundler, transpiler, and package manager into the runtime itself. Work that used to need three external tools (webpack, babel, npm) and all their dependency trees collapses into a single runtime. Deno introduced a URL-based module system without npm from day one, and its native TypeScript support removed the need for tools like ts-node.

The same pattern shows up in Node.js itself. Node.js 18 added a built-in fetch(), which trimmed the need for an HTTP client package. Node.js 20 stabilized a built-in test runner, which trimmed the reason to install jest or mocha for simple tests. The bigger the runtime gets, the less need there is for external dependencies. And the fewer external dependencies, the smaller the attack surface. Regardless of any individual developer's intent, the technical infrastructure itself is evolving in the direction of fewer dependencies.

Developer Behavior Has to Change

Up to here it's been a story about design and technology. What good open source looks like, how runtimes are evolving. But no matter how many times the runtime adds fetch() or how many projects pursue zero dependency, a developer's hand still types npm install. Tech opens the door. Behavior actually has to walk through it.

The three layers of dependency cost

The cost of adding a dependency comes in three layers.

Layer 1, surface cost (the visible stuff). Bundle size growth, longer install times, larger node_modules folders. Most developers see this cost, and it's the most discussed. In practice, it's also the least important.

Layer 2, maintenance cost (what shows up over time). Version conflicts, breaking changes, deprecated API migrations, security patches. These costs accumulate as a project runs for one, two, five years. The maintenance burden of a 10-dependency project versus a 100-dependency one isn't linear. It's exponential.

Layer 3, trust cost (invisible until something blows up). Supply-chain attack exposure, maintainer departures, license changes, malicious-code injection. This is the layer litellm and axios revealed. The probability is low, but when it triggers, the damage outweighs layers 1 and 2 combined. And the more dependencies you have, the higher the probability climbs. Each dependency widens the attack surface.

When AI recommends dependencies

There's an interesting, slightly unsettling phenomenon. Ask an AI coding assistant to write code that makes an HTTP request, and a large fraction of the time you get code that imports axios. Even though Node.js 18 and later ship with built-in fetch(). The AI's training data is overwhelmingly axios, so the most "probabilistically plausible" suggestion is axios.

Compared side by side:

// Before: axios dependency
const { data } = await axios.get("https://api.example.com/data");

// After: built-in fetch (Node.js 18+)
const data = await fetch("https://api.example.com/data").then(r => r.json());

Is there a reason to install an external package for those two lines? If your project genuinely needs axios's interceptors, request cancellation, or automatic JSON conversion, fine. But for the majority case of a simple GET or POST, built-in fetch() is plenty. The AI doesn't make that contextual judgment. It just emits the most frequent pattern from its training data.

This creates a feedback loop where "popular package equals good package" gets stronger in the AI era. Heavy use leads to AI recommendations, AI recommendations lead to even heavier use, and heavier use raises the attack value. axios's 100M weekly downloads is also a signal to attackers: "Hit here and you reach 100 million projects."

That's why uncritically accepting AI-generated code is dangerous. AI doesn't check whether the package's maintainer has 2FA enabled. AI doesn't inspect what the package's postinstall hook actually does. AI just pattern-matches and emits the most common code, and the most common code is not guaranteed to be the safest code.

Checklist: six things to verify before adding a dependency

Here's a practical checklist. Before you add a new package to your project, run through at least these six items.

1) Is there a built-in alternative? Check whether you're using an external package for something Node.js's fetch(), crypto, test runner, path, or url could already handle.

# List Node.js built-in modules
node -e "console.log(require('module').builtinModules.join('\n'))"

2) How deep is the dependency tree? A single direct dependency can drag in 100 transitive ones. Inspect the tree before installing.

# Inspect an npm package's dependency tree
npm view <package> dependencies
# Visualize the full tree
npm ls --all
# For Python
pip show <package> | grep Requires

3) Maintainer info and activity history? OpenSSF Scorecard automatically grades the security practices of an open-source project. Maintainer 2FA usage, code-review practices, CI/CD security, branch protection rules, all scored.

# Check security score with OpenSSF Scorecard
# https://scorecard.dev/ — paste your GitHub repo URL
# Or via CLI:
scorecard --repo=github.com/<owner>/<repo>

4) Are there postinstall hooks or auto-run code? The core of the axios incident was the postinstall hook. Inspect the package's scripts field before installing.

# Check a package's install scripts
npm view <package> scripts
# Find postinstall hooks among already-installed packages
find node_modules -name "package.json" -exec grep -l "postinstall" {} \;

5) What's the SLSA level? Supply-chain Levels for Software Artifacts. A framework that indicates how verifiable a package's build process is. SLSA Level 3 or higher means the build process is recorded in a tamper-evident way, so even if an attacker compromises the build pipeline, you can trace it.

6) How many lines if you implemented it yourself? left-pad was 11 lines. The Superpowers WebSocket handshake was 5 lines. If the cost of writing it yourself is low, it can be a better choice than adding a dependency. Remember Rob Pike's "a little copying is better than a little dependency."

This checklist isn't saying "never use any dependency." Implementing your own crypto library is a fast path to a security disaster. Writing your own database driver is unrealistic. Asking "do I really need this package, or am I adding it out of habit?" alone reduces your attack surface.

Closing

npm install was never a convenience command. It was a trust command.

The moment I type npm install axios, I'm declaring that I trust axios's maintainer, the maintainers of every package axios depends on, the security posture of all those maintainers' accounts, the integrity of the npm registry, and every single line of code the postinstall hook will run on my system. pip install litellm is the same. Most of us have been running this command dozens of times a day without registering the weight of what we're trusting.

354 lines is something you can read. 1,200 lines is something you can only trust.

What Superpowers' server.cjs shows isn't minimalism or showing off. It's a decision: "I'll only put code I can understand into my project." Of course not every project should write its own 354-line implementation. That's unrealistic. What is realistic is asking, every single time, "do I really need this dependency, would it be hard to write myself, do I have grounds to trust this package?"

Few dependencies isn't minimalism. It's reducing what I have to trust.

In the end, "don't reinvent the wheel" is still good advice. Building the wheel yourself is a waste of time most of the time. But the adage probably needs a line added to it.

Don't reinvent the wheel. But know what the wheel is doing in your car.

References

litellm, Security Update — March 2026
Sonatype, Compromised litellm PyPI Package
Kaspersky, Critical Supply Chain Attack
HeroDevs, The LiteLLM Supply Chain Attack
Elastic Security Labs, Axios: One RAT to Rule Them All
Huntress, Supply Chain Compromise: Axios
Sophos, Axios npm Package Compromised
Russ Cox, Our Software Dependency Problem (2019)
Rob Pike, Go Proverbs (2015)
David Haney, Have We Forgotten How to Program? (2016)
Suckless, Philosophy
Wikipedia, npm left-pad incident
SLSA, Supply-chain Levels for Software Artifacts
OpenSSF, Scorecard
npm, Scripts Best Practices
GitHub, obra/superpowers
Flowkater.io, Introducing Superpowers

AI Native Engineer: Taste Built on Principles

Tony Cho (https://flowkater.io) — Tue, 24 Mar 2026 15:20:00 GMT

AI Native Engineer: Taste Built on Principles

1. Opening

The AI-native era is here, and a lot of people are scared of it. The phrase "developer collective depression" doesn't feel like an exaggeration anymore. Fear of losing the job, even talk of a new Luddite movement. Should we go smash AI servers like the original Luddites did? Will the AI revolution leave every individual knowledge worker without work, with a handful of giants hoarding all knowledge production? I don't know.

As I've said in earlier posts, AI is essentially a mirror of the person using it. Strong people get stronger with it; lazy people get lazier. We've reached the point where it's hard to imagine working without it, and yet what I see around me is that output quality still depends massively on the individual. Right now everyone's swept up in FOMO, racing to adopt new tools and reshare and repost them, but the people who actually ship results are still few.

I've worn a lot of hats (developer, lead, builder) and met a lot of people. From that experience, my own take is surprisingly optimistic. Anxiety is what shows up first, but this post is about what's beyond the anxiety. Companies will start hiring juniors again soon, and the companies that succeed in the AI era will hire too. I run more than twelve agents around the clock, every day, applying every skill and case study I can find to my own work. And yet when I stop everything and write a single thought-note on a blank page, that one note beats all of it. The agents can chew through my notes and produce a working implementation at speeds I couldn't have imagined before, but without "my" thought-note in the first place, what is there to start from?

No matter how many times you square zero, it's still zero. An AI Native Engineer is someone who isn't zero.

In 9 Skills of Agentic Engineering I covered the How, and in AX Organization Transformation I covered the Where. This time it's the Who: what kind of person is an AI Native Engineer? This one is the hardest to write, because in the end I have to look at myself.

The How has plenty of answers already. OpenAI laid out a Delegate-Review-Own model. Mike Mason calls Context Engineering the next stage after prompt engineering. Karpathy described how spinning up agents, handing them tasks, and reviewing them in parallel has already become an engineer's daily routine. Steve Yegge orchestrated 20-30 agents in parallel and produced a million lines in a year. There's no shortage of methodology for how to use AI.

What's missing is the Who. Handling tools well is a condition, not an identity. Knowing your knife doesn't make you a chef, and being good with AI doesn't make you an AI Native Engineer.

Taste without principles is guesswork.

2. What Got Exposed: How This Differs from the Old Engineer

I've been writing software for fifteen years. Most of that time was spent wrestling with tools. Memorizing language syntax, learning framework conventions, configuring build systems, building deployment pipelines. There's a line in the preface to Drew Hoskins's The Product-Minded Engineer:

The tools and languages were so hard that learning and using them was a full-time job in itself.

Reading that, the last puzzle piece finally clicked into place.

Good with Swift, you're an iOS developer. Good with React, you're a frontend developer. Good with Kubernetes, you're a DevOps engineer. The tool was the identity. "What you built with" mattered more than "what you built." That was the era.

AI has started doing that full-time job for us.

Syntax? AI knows it. Frameworks? AI knows them. Build configs? AI handles them. Deployment pipelines? AI writes them. It's not perfect yet, of course. Complex legacy codebases still demand grunt work. But the direction is clear.

And once that full-time job started disappearing, the things that should have mattered all along (the ones the tools were hiding) came into the foreground.

It came down to user understanding, product thinking, business ownership.

The data backs this up.

According to the DORA 2025 report, PR volume jumped 98% after AI tool adoption. Almost double. Software delivery performance, though? Flat. No change.

The number stings a little, and Nicole Forsgren named exactly why. The coding inner loop (writing code, testing, building locally) got faster. The outer loop (review, approval, integration, deployment, security, feedback) is still the bottleneck. The real bottleneck was never coding. What we called "coding productivity" was a tiny slice of the full value chain.

There's a colder data point too. Professor Rem Koning at Harvard Business School ran an experiment giving ChatGPT to small-business founders in Kenya, and the group that was already underperforming saw their profits drop 10% after using AI. They got plenty of advice, but they didn't have the judgment to tell good advice from bad. The group that already had judgment filtered out the bad advice and improved their numbers. Koning's takeaway: AI is not an equalizer. It's an amplifier. Without earned insight from doing the work yourself, AI just leads you toward slop.

The era of "as long as you can code, you're fine" hasn't ended. What's ended is the illusion that coding alone is enough. The easier the tools get, the less you can hide behind them.

So what specifically has changed? Compared to the old-era engineer, three things stand out.

The expansion of responsibility

Back when I was leading a dev team, our scope of responsibility ended at delivery. Shipping accurately and quickly, that was our job. Sales and ops were always somebody else's. What I feel viscerally now is one thing: discovery matters more than delivery, and shipping one well-found thing beats shipping a hundred wrong ones. In a world where building got dramatically faster, an engineer who can't take ownership of discovery is a zero engineer.

In the old days the PM asked "why are we building this?" on the engineer's behalf. Engineers got the spec and implemented it. That structure wasn't bad. It made sense. Tools were too hard. But now that AI has taken a big chunk of implementation, what is an engineer who doesn't know the "why" actually doing? Handing the spec to AI? That's a relayer, not an engineer.

What does the user actually want? What's the business impact of this feature? Why is this priority set this way? The gap between engineers who get this context and engineers who don't is widening dramatically in the AI era. AI takes over the "how," so an engineer who doesn't know the "why" has nothing left. An engineer who doesn't know the user just gets left behind.

"Sam" has been GitHub-inactive for five years and has zero social media presence. And former colleagues line up to hire him. One startup was ready to invent a new role just for him. The moment he understands a project, he breaks the whole thing down on a whiteboard. He frames delays as "tradeoffs," not "delays." When working with another team, he doesn't open with "please do this for me," he opens with "the customer is hitting this problem, can you walk me through how this system works?" High-level decomposition plus customer-problem thinking, in one person. This isn't a new AI-era skill. It's what good engineers always did. Now there's nowhere left to hide if you don't.

Ten times faster learning

If AI generates ten times the code, the speed at which you read and judge that code has to be ten times faster too. To look at the 200 lines AI produced in 30 seconds and call out "this is right, that's wrong, here's a perf issue," your foundation has to be solid.

Basics never die. CS fundamentals, yes, and also a deep grasp of whatever tool you use, language or framework. Surface familiarity won't let you tell good AI output from bad. And if you can't tell, you stop using AI and start being dragged by it. That's not AI Native, that's AI Dependent.

In the old era, going deep on one technology kept you fed for ten years. Good at Java? Java developer. Good at Golang? Backend developer. Depth was the moat. Not anymore. AI is collapsing the walls between languages. Someone who doesn't know iOS can ship an iOS app with AI's help (I'm exhibit A). But that doesn't make them an "iOS engineer." The gap between code that runs on the surface and code that holds up in production is still wide.

So the way you learn has to change. Instead of building up brick by brick from syntax, you read what AI generated and reverse-engineer the principles. Dig into "why does this code behave this way?" and you naturally end up at deep understanding. AI's output becomes your textbook. You just need the eyes to read it.

The speed of judgment

Forsgren pointed out that working with AI means reconstructing your mental model dozens of times in 30 minutes. The old pace was one or two design decisions a day. Now it's dozens in half an hour.

When AI offers three approaches, you decide which one fits. When AI says "add a cache," you decide whether caching is actually the answer or whether the query itself is the problem. When AI proposes a refactor, you decide whether it's genuinely better design or just the same complexity wearing different clothes. All of this happens fast, back to back.

Fast judgment comes from deep understanding. Someone who knows the principles can decide intuitively. "This is O(n²), so it'll break under load." "This structure has consistency problems in a distributed setting." "This API design dumps too much responsibility on the client." Calls like these need to land in half a second, in this era.

Before, if you didn't know, you could look it up. There was time. Now, to keep pace with the speed at which AI throws code at you, your gut has to fire before you can look anything up. That gut doesn't come out of nowhere. It comes from years of accumulated principles.

The essence didn't change; it just got exposed plainly. Good engineers had these three things before too. The tools were just hard enough to hide them.

I might have been hiding behind tools for fifteen years. Using frameworks well, building servers well, designing solid architectures: I believed that was my craft. Not entirely wrong, but I sometimes acted like that was the whole story. There were times a user told me something was painful, and I doubled down on "technically it's correct."

When AI started doing that toolwork for me, the person hiding behind the tools got exposed.

(And it's an uncomfortable thing to sit with.)

3. The Maker's Backlash

In the earlier AX post I separated Maker from Closer. The Maker produces; the Closer brings outcomes home. Now let's go a level deeper, from the org to the individual.

The classic Makers believed their KPIs were aligned with making things. Writing code, shipping features, opening PRs, closing sprints. The ones who coded until dawn, deployed on weekends, ran toward incidents. They were genuinely working hard.

But in most IT orgs, there comes a moment when most of the Makers are judged not to be contributing directly to business KPIs. That's when the layoffs hit. Big tech let go of tens of thousands, and most of them were people who'd been working hard.

It's painful, but most of the time this isn't the individual's fault. It's the backlash that hits people who were hired without thought and who diligently did the work they were given.

AI is accelerating that backlash. The Maker's work (writing code, implementing features) has been shown to be replaceable by AI in large part, and the wave is breaking faster than we expected.

Some Makers, even in that environment, were probably already asking "is this actually contributing to the org?" The ones who checked the data on their own features, and were the first to say "let's kill this" when the metrics didn't show up. Either they left or they survived. Either way, the depth of their thinking made the next move clearer. (Not everyone gets that chance, of course.)

It used to be that "a Maker with a Closer mindset" was a compliment. "A developer who also gets the business. Impressive." Now? You can't survive without being a Closer. It's not praise anymore. It's a survival condition.

The faster I click-build with AI, the faster everyone else does too. The act of building itself is becoming a commodity. In an era where anyone can build, "I'm good at building" doesn't differentiate. Click and you have an app. That's not an edge anymore.

The love of building isn't the problem. The direction of that love has to shift.

This isn't a knock on Makers. I'm one. I love building. The rush of writing code, designing architecture, opening a clean PR. That part is still good.

But love doesn't change reality. We're in an era where building alone doesn't complete value. Build, deliver, validate, iterate, and only then does it become value. Someone has to own the whole arc.

This is the bitterest part for me. I had Maker pride too. I believed building well was valuable in itself. Reality is colder than that.

That doesn't mean technology stopped mattering. The opposite, actually. Technology matters more. Just a different kind of technology.

I learned this one in my body.

4. The Sorcerer's Mistake: The Paradox Where Tech Matters More

Right now I'm building a backend in Golang and a client in iOS.

In Golang I sometimes get lost in code-logic bugs, but I rarely get blocked by not knowing the tech itself. I jump straight to the core logic, fix things fast, and grasp AI-generated code quickly.

iOS is a different story. I started native iOS development for the first time this year, and I'm leaning heavily on AI. When variables don't render right in WidgetKit, when layouts don't come out the way I want, my native iOS skills aren't strong enough, so I spent two or three days endlessly editing while neither AI nor I could actually fix the problem, stuck in an infinite loop. Most of it was layout issues that don't surface in code logs, or transitions that need to feel natural. Walking in without basics, from how Navigation Stack works to Liquid Glass, I got destroyed.

AI throws together believable-looking output by vibe. On the happy path everything looks normal. Then you go back, the header breaks. The loading cuts out. The transition feels wrong. AI implemented everything, and somehow the result was indistinguishable from nothing being implemented at all. Every skill, every harness-engineering trick. The work was still piece by piece. I'd talk through the problem with AI, reproduce it by hand, and when AI kept circling the same bug, I'd try to give it the parent layout or app-wide context. Eventually one thought kept coming back: "an iOS engineer would have done this in five minutes."

WidgetKit, screen transitions, all of it: using technology I didn't understand cost me days. If I'd been an iOS engineer, I'm certain I would have fixed it in five minutes. Being able to do something doesn't mean you can do it well.

A product engineer is an engineer who thinks about the end user. What AI produces is a starting point, and shaping it into something the end user is actually happy with comes down to the engineer. Principles and taste aren't opposed. The more you want to exercise taste, the more you'll find that without understanding the principles, you cannot produce real quality.

This isn't only my experience.

Carson Gross (HTMX creator, Montana State professor) calls it the "Sorcerer's Apprentice Trap," and that was exactly my situation. The Disney Fantasia scene where the apprentice enchants the broom to fetch water and loses control. (My all-time favorite Mickey episode, though Ellie says she doesn't really know it. Anyone else here who watched the Disney cartoon hour on Sunday mornings?) The relationship between AI and coding looks just like that.

If juniors don't know how to write code, they don't know how to read code. And if they can't read it, they get jerked around by the LLM.

That described my iOS problem precisely. Code is usually read more often than it's written. So AI writes the code for you, and writing matters less? Reading matters more, then. You have to read code you didn't write, understand it, and judge it.

What's scarier is the broken feedback loop. Normally, as code gets more complex, your body sends a signal first. Hands stop, head hurts, the "this is too hard" signal arrives. That signal forces the design simpler. With AI generation, that process disappears. AI hands over 200 lines or 2,000 lines without flinching. Complexity stacks invisibly, then explodes all at once.

LLMs don't reduce essential complexity. They just generate accidental complexity easily. That's the exact distinction Fred Brooks drew in 1986 in "No Silver Bullet."

Steve Krouse (Val Town CEO) hit something similar from a different angle, and that overlaps my experience exactly.

Vibe coding gives you the illusion that your vibe is a precise abstraction.

It works at first. The demo is perfect. You win the hackathon prize. Then you add features or grow the scale, and bugs sneak in at lower abstraction layers you never understood. Same as my iOS experience. Network timeouts, memory leaks, concurrency issues. In production, with users doing things you didn't expect, problems erupt at layers you never grasped.

And to debug those problems, you end up needing the principles.

One of Krouse's questions stayed in my head. "Nobody talks about 'vibe writing.'" Nobody seriously argues for "just write by feel" when it comes to writing. Good writing demands grammar, structure, and reading a lot of other writing. So why does coding get the "just go by vibe" treatment?

Thinking it through, it comes down to tool knowledge versus principle knowledge.

Tool knowledge. Swift syntax, React patterns, Kubernetes YAML, the API of a specific framework. AI can replace this. It already is. I don't memorize Swift syntax these days. Claude knows it.

Principle knowledge. Networking, computer architecture, OS, distributed systems, data structures, algorithms. This is what shines more in the AI era. To judge "why is this code slow" or "why is this concurrency broken" in AI-generated output, you have to know the principles.

You can ask AI "why is this code slow?" AI might give you an answer. Whether that answer is right is up to a person who knows the principles. AI says "add a cache," but the real problem is N+1 queries? Only someone who understands networking and the database can call out the difference.

Engineering mindset emerges on top of engineering theory. Mindset alone doesn't build it. (A product engineer is, in the end, still an engineer.)

Saying "this API feels slow" without understanding networking is guesswork, not taste. When someone who understands TCP handshakes says "this is where the latency is happening," that's taste. Saying "the app feels heavy" without knowing the OS is impression. When someone who knows memory management points and says "there's a leak right here," that's diagnosis.

Product taste has to sit on top of CS fundamentals. Order matters. Trying to grow taste without the foundation is like playing technical soccer without basic conditioning.

The sorcerer's mistake lives here. AI replaces tool knowledge, so people assume "tech matters less." In practice, principle knowledge matters more. The space tool knowledge used to fill has to be filled by principle knowledge.

5. Taste on Principles: Eval

So are principles enough?

No. Principles without taste might make a scholar, but they don't make a good engineer. A CS PhD doesn't automatically become a great engineer. Writing strong papers doesn't mean you ship strong products. Knowing the principles is necessary, not sufficient.

The "Bob" story from Hoskins's The Product-Minded Engineer illustrates the point well. Bob is a capable engineer. He knows the principles, his code is clean, his reviews are thorough.

But Bob implements without scenarios.

He builds what's in the spec, without asking "in what real situation would the user actually use this?" Bob's features work, the tests pass. And the users don't use them.

"Technically perfect feature, nobody uses it." I've lived through that several times (and maybe still am).

Hoskins compared engineers to editors. Not the person who writes good sentences, but the one who cuts the unnecessary ones. The person who decides "we don't need this." That's the heart of Product Architecture.

AI can write code well. Whether the feature is what the user actually needs is still on the human.

That judgment is what Anthropic called "taste." The thing that the people who build AI best are the slowest to hand off to AI.

The word "taste" sounds a little mystical, though. "Isn't taste innate? What if you don't have it?" I felt that way at first too. Especially watching engineers around me who clearly do.

Then Linear CTO Thomas answered it cleanly.

Taste is not mystical. It's a craft.

Taste isn't mystical. It's a craft. It can be sharpened.

The Linear team proved this. Quality Wednesday: every Wednesday the entire team hunts and fixes product defects. Subtle scroll jank, buttons misaligned by 3px, they go after the small stuff with discipline. Over two years they fixed 2,500 defects. Repeat that weekly and a "always-looking-for-the-next-thing-to-fix" mindset forms automatically. Taste sticks to your body like muscle that way.

The way reading lots of good writing makes bad writing jump out. The way experiencing lots of good UX makes bad UX irritate you. The intuition that shows on the outside is the result of experience stacked on the inside.

Taste is accumulated experience, not innate gift.

I covered the 9 skills of agentic engineering in an earlier post. Writing this one, I'm convinced I need to add one more.

Eval. The judgment to evaluate what AI produces.

If the nine skills are "how to work with AI," Eval is "how to judge AI's output." If taste is the gut feeling that says "this isn't quite right," Eval is the ability to point out exactly why and propose a fix.

You hand AI a UI layout. It generates code, runs the tests. All Pass. CI is green.

"Oh, looks good."

Then you touch it on a real device, and it's a different story.

The layout warps on a narrow screen. The scroll feels off. The touch target is too small for a thick finger. The text gets clipped. The animation behaves differently from intent.

AI optimizes for the test cases, not for the user experience. Whether AI-generated code functions and whether it gives users a good experience are completely different things. AI can do the first. The second is still on humans, and will be for a while.

The deeper problem hits when AI patches a narrow piece of the layout in a way that breaks consistency, or when development heads in entirely the wrong direction. Tests stay green, the product drifts somewhere strange. AI Is Only as Smart as You Are, that's the title of a post I wrote before, and it lands here. AI's judgment doesn't exceed mine.

"Is AI's All Pass also All Pass for me?"

Anyone who can ask that question is an AI Native Engineer. Not the one who relaxes when tests pass, but the one who looks with their own eyes, touches it with their own hands, and judges from the user's seat. That's Eval.

And this Eval is sharper in the hands of someone who knows how networking moves, how memory is managed, how the rendering pipeline flows. Taste exercised on top of principles. That's real Eval.

End-to-end ownership (from spec to deployment to user feedback) used to belong to the PO or the CEO. Developers "just built what they were told." That boundary is collapsing. AI takes over "building it well," so the human role left is "what should we build" and "is it actually valuable."

This can't run on individual willpower alone. If the role and responsibility aren't given, the environment isn't a good one for becoming an AI Native Engineer. "You just implement this feature to spec." Eval can't grow in an org like that. No chance to meet users, no permission to look at metrics, no space to ask "why are we building this?"

6. So here I am

If you've read this far, you're probably wondering what to do.

I went through all of it. Stepped on every trap.

"I'll learn first, then start." I did that too. Sounds reasonable, but it was actually a declaration of not-starting. AI tools change every month. If you wait for prep to finish, you'll never start. Like swimming, you have to get in the water; reading swimming theory books won't teach you to swim. The day I caught myself saying the same thing a year later, I finally got it.

I've also opened Twitter, muttered "I should be doing this too," and closed it. Anxiety can fuel action, but anxiety itself isn't a substitute for action.

"Maybe I'll take a course first." I thought that too at first. Then I realized AI answers my actual questions directly. The time spent watching a course is ten times less productive than just hitting the wall. Isn't it a little odd for an engineer to learn a tool by watching a lecture? (I know that sounds kkondae-ish (a patronizing senior who can't read the room). There are good courses out there too.)

There was a stretch when I only communicated through code. The more AI takes over coding, the more talking with people and understanding users becomes the differentiator, and I've been the one who froze when asked "explain why we need this feature."

The turn came when I started running into the wall.

When I got stuck, I asked AI. When I hit an error, I pasted the error message; when I got blocked on design, I asked about architecture; when a test failed, we analyzed the failure together. The process itself was the learning. One round of grunt work beats ten courses, and I learned that with my body.

I built what I actually wanted. Not "someday I'll build it," but starting now. With AI, I could build it much faster than before. Building fast meant failing fast too. That, more than anything, was the opportunity.

And critically, I dragged it all the way to a product with real users. A side project you alone use, and a product other people actually use, are worlds apart. The mix of unease and learning that hits the moment a user says "this is uncomfortable," you have to feel that to develop product sense. AI doesn't grow that for you.

I have a soft spot for tech and for the Maker craft. Still. A new framework drops, I want to play with it; designing clean architecture makes me proud; code that runs beautifully makes me happy. (Listening to talks on extreme Golang concurrency at GopherCon and feeling crushed in the process is a bonus.) When I see a great Maker, I respect them.

But the real value-creation wasn't in any of that. However beautiful the code, if it doesn't deliver value to the user, it's self-satisfaction. All of us are only meaningful when the business actually generates value. It took me a long time to admit that.

I've watched developers dismiss users. "What do users know?" "The plan is wrong, technically this is the right way." I've done that too (it's embarrassing in retrospect).

Looking back, that was a defense mechanism. Taking users seriously requires much higher quality. Smooth UX, server infrastructure that doesn't drop, AI features that feel natural. Take users seriously and every domain has a real challenge in it. Without confidence, the easier path is to look away. Hiding inside "what's technically correct" is comfortable.

I've built a lot of products. I've had a business fail. I've experienced firsthand how an org collapses when it can't be led to a win. I've watched capable people lose direction and scatter. Technically the team was excellent. We just couldn't answer the question "is what we're building giving real value to users?" with any clarity.

That experience, more than my tech romance, is what brought me here.

Loving tech and using tech to create value are different things. They're not opposed, though. I love tech, so I dig into the principles, and because I know the principles, I shine more in the AI era. Without love, principles are just textbook content.

I want both. The priority shifted. Tech used to come first. Value comes first now.

Am I an AI Native Engineer(?) I'm not sure. The honest answer is "I'm becoming one." I do know the direction. Value over tech. Close over Make. User over code. Stack the principles, sharpen the taste.

Right now, working with Ellie on an app, I use AI, take feedback, watch the metrics, and fix it again. Break yesterday's work today, fix today's work tomorrow. I try to judge user experience while understanding the rendering pipeline, and try to say "we don't need to build this" while knowing the system architecture.

I'm still scrambling to create value today. That's all there is.

It isn't glamorous. But this is the real thing.

7. Closing: A Compass on the Accelerant

"Is what you're building actually generating value?"

Even when I run dozens of agents through n rounds of feedback to produce a draft plan, Ellie still finds gaps. Same for this post. AI gave me a plausible draft, but in the end I rewrote most of it myself, with AI handling only some of the polish. You know the feeling. A piece written entirely by AI lands flat.

However much you automate testing, you can't skip the act of using it yourself. And often, using it once and writing it up is faster than running 100 AI tests.

Terry Winograd, the Stanford first-generation researcher who has watched AI for over half a century since SHRDLU in 1971, said this:

AI is not the cause of the problem. AI is an accelerant.

Coming from someone who walked through the past AI winters in person, that lands with weight. Problems that already existed are getting accelerated by AI. People running in a good direction arrive at good places faster. People running in the wrong direction hit the wall faster. What changed is the speed, not the direction.

An accelerant needs a compass.

That compass is taste built on principles.

Taste without principles stays guesswork; principles without taste stay academic.

An AI Native Engineer is someone who exercises taste on top of principles.

Someone who understands networking and can also judge user experience. Someone who knows system architecture and can also say "we don't need to build this." Someone who can read code and also ask "is this even the right problem?"

Even with the How (agentic skills), even working in the Where (an AX org), if the Who (you) isn't someone who exercises taste on top of principles, none of it means anything.

Taste built on principles. That's the AI Native Engineer.

End-to-end ownership has always been what made a good engineer. It was true before AI, and it'll be true after. The only difference is, now it's hard to keep pretending otherwise.

References

Andrej Karpathy, X post on AI Builders vs Coders
Mike Mason / ThoughtWorks, "AI-First Software Engineering — Context Engineering"
Steve Yegge, "Revenge of the Junior Developer"
OpenAI, "Building an AI-Native Engineering Team"
Drew Hoskins, The Product-Minded Engineer (O'Reilly, 2025)
Pragmatic Engineer, "How to Be a 10x Engineer"
DORA, "Accelerate State of DevOps Report 2025"
Rem Koning / Harvard Business School, "AI Native lecture at Harvard" — EO Korea
Nicole Forsgren / Faros AI, "Key Takeaways from the DORA Report 2025"
Terry Winograd, Stanford Interview on AI as Accelerant
Carson Gross, "Yes, and... The Sorcerer's Apprentice Trap"
Steve Krouse, "The Death of Code is Greatly Exaggerated"
Anthropic, "How AI Is Transforming Work at Anthropic"
Pragmatic Engineer, Product-Minded Engineer Panel — Linear CTO Thomas: "Taste is not mystical. It's a craft."
Linear, Quality Wednesday
flowkater, "9 Skills of Agentic Engineering"
flowkater, "AX Organization Transformation"
flowkater, "AI Is Only as Smart as You Are"
flowkater, "No Victory, No Future"
flowkater, "2025 GopherCon Korea Review"

Installing Claude Code Across Your Org Doesn't Make It AX

Tony Cho (https://flowkater.io) — Sun, 15 Mar 2026 09:30:00 GMT

Installing Claude Code Across Your Org Doesn't Make It AX

1. To Start: Between Admiration and Discomfort

A LinkedIn post showed up the other day about a Korean startup rolling out Claude Code company-wide. I also saw a piece claiming OpenClaw was deployed across the whole org and productivity exploded. Given how conservative the corporate culture is here (even at startups), these moves deserve credit.

Honestly, my first reaction was, "Wow, that's pretty serious." I've also bolted AI tools onto my own teams, hooked up MCP, felt the productivity bump in my own hands. But something kept nagging at me. The tools clearly got better. The way the org actually worked together didn't feel any different.

Is that really AX at the org level (AI Transformation, redesigning the entire organization with AI as the baseline assumption)? Can we actually say the org "adopted" AI in any meaningful sense?

Before Claude Code there was Notion. Before that, Google Drive. Before that, Slack. So let's get to the bottom of it. Over the past decade, did organizations adopting these tools see truly explosive productivity gains compared to before? Did those gains actually convert into business outcomes? Are there real cases where simply adopting a tool produced that kind of result?

I've written plenty about AI-native engineers. This time I want to talk about how an AI-native organization should actually be built. The gap between what we feel personally as LLM performance leaps forward, and what shows up at the org-level outcome layer.

To untangle that gap, the vocabulary has to come first. The reason this distinction matters is that too many people use the same phrase, "AI adoption," to mean wildly different levels of change. I think of the relationship between AI and organizations in three stages.

Stage 1. AI usage. Individuals using ChatGPT, Claude, or Copilot for work. Most office workers are here. They're picking tools, tuning prompts, going "huh, this is actually pretty handy."

Stage 2. AI adoption (enablement). The company writes install guides, runs training sessions, hands out access. Most of the recent buzzy cases sit here. They deploy MCP, teach non-engineers how to use it, demo workflows by job function. It's genuinely valuable work.

Stage 3. AX transformation. Roles, approval flows, KPIs, pipelines, governance, and accountability all redesigned around AI as the baseline. Almost no organization has gotten this far. Bain says "treating GenAI like a tool doesn't work," and McKinsey notes that only 1% of organizations have reached AI maturity, with leadership being the biggest blocker.

This post is about mistaking stage 2 for stage 3. About the difference between adoption and transformation, and how (inside that gap) the roles of organizations and individuals, especially Makers (who produce) and Closers (who deliver outcomes), need to shift.

2. A Good Adoption and a Good Transformation Are Not the Same

Let's read this fairly first. The recent Korean cases that drew attention genuinely did certain things well.

That company removed the install barrier. They wrote OS-specific guides, deployed MCP, ran company-wide sessions. They had non-engineer colleagues do the demos themselves, building the perception that "I can do this too." BX, HR, PM, finance, CX, business development: every job function got an actual workflow, and an internal survey showed every respondent said they used it almost daily.

Most domestic companies still can't pull this off. That's worth real credit.

There are great adoption stories abroad too. Morgan Stanley embedded AI directly into its financial advisor workflow, connecting meeting notes to summary to email draft to Salesforce save, and the majority of advisor teams adopted it. Impressive. But as Morgan Stanley itself says, "human relationships remain the core." The financial advisor's role itself didn't change. The tool got better.

So we have to push the question one step further. What's the unit of change here?

The PM became a faster PM. Finance became faster finance. HR became faster HR. AI accelerated the work inside each role. But did the relationship between PM and finance and HR shift? The approval flow? The KPIs? The decision-making path?

A Reddit comment nails this exactly: "AI made the 'typing' part instant, but it didn't solve the organizational friction. So the business sees the same velocity, even if the devs feel like wizards."

Funny thing. There was a similar scene seven or eight years ago. People expected adopting Notion to make organizational knowledge flow. Adopting Slack would supposedly speed up communication. The result? Notion became per-team wikis. Slack became per-team channels. The tools changed; the structure of how information flowed didn't. The same pattern is repeating with AI tooling now.

Laying down a tool and rebuilding the organization around that tool are different categories of work. "But the start matters, doesn't it?" Sure. I'm not denying that. The point is, you can't mistake the starting line for the destination.

After your org adopted AI, did the number of handoffs between PM and engineer shrink? Did approval time shrink? Did the speed of value reaching the customer go up? If the answer to any of these is "no," it's still adoption, not transformation.

Of course there are teams doing this well. A 10-year engineering manager summarized his team's full Claude Code rollout this way. They consolidated on a single agent environment. They converted existing Confluence runbooks into AI skills. They structured the codebase and architecture docs as AI context, automated ticket creation, and started catching missing requirements upfront. They had AI verify meeting notes so any tangent could be corrected immediately, and they added a new weekly "AI workflow share" meeting.

What stands out here is that this team didn't just install tools. Meeting structure, ticket process, doc system, weekly sync. They changed the way the work itself flowed. A small European operator said something similar: "The biggest shift wasn't the tools. It was redesigning the daily workflow. Bolt AI onto the existing process and you get marginal gains at best."

When you look at the places that pulled it off, a pattern shows up. They didn't succeed by laying down better tools. They succeeded by changing how they actually worked. But that's a story about one team. Team-level transformation is doable. Org-level transformation only happens when leadership rewires the structure.

3. Why a Functional Org Absorbs AI and Stays the Same

The problem with a functional org isn't that it can't use AI. It's that it absorbs AI too well. Marketing inside marketing, finance inside finance, engineering inside engineering. Each function bolts on AI and gets faster. But the place that needs AX isn't inside a department. It's between departments. It's the entire value flow, not the inside of a job function.

AI Doesn't Break Silos. It Reinforces Them.

People expect AI to tear down the walls between departments. The reality can go the other way. When each department optimizes AI only for its own work, marketing builds a marketing-grade AI, customer support builds a customer-support-grade AI, and you end up with per-department AI islands while company-wide outcomes stay flat. HBR points out that AI often reinforces functional silos rather than dissolving them.

Bolt AI onto a functional org and the functional org doesn't disappear. It becomes a faster functional org. That's high-speed silo-ification, accelerating territorial behavior between departments. It's not AX.

The Bottleneck Lives Between Departments, Not Inside Them

Here's something I lived through as CTO.

Another department sent over a project plan that would actually move a real KPI. Once it landed in R&D, it slipped behind our own priorities and weeks went by. Eventually those departments routed cooperation through the CEO not as a "request" but as a "directive," and we stopped what we were doing to start theirs. By then the launch timing had drifted way off. The market wasn't going to wait.

It happened inside the same division too. The information pipeline from lead to PM to design to frontend to backend to QA wasn't smooth, more often than not. Most teams stayed inside their own KPIs (engineers ship their assigned features, for example), and so things like product polish or post-launch analysis (arguably more important than the build itself) didn't get done even between people building the same product.

In the end, "Why are we doing another department's work?" started showing up as a complaint. I couldn't blame the team member who said it. People drift in that direction because of structural limits, and that's not on the individual.

After that, the org shuffled along. Everyone got busy passing accountability around. The org itself lost initiative. Capable individuals dimmed too. Leadership grew frustrated with increasingly passive members, and the increasingly passive members grew distrustful and disappointed with the org. Maybe it was a case of the functional org's downsides taken to the extreme. But watching capable, self-driven teammates turn passive isn't a pleasant thing to sit with as a leader.

(I covered more of this in How Organizations That Don't Win Fall Apart.)

The problem isn't "who works faster." It's "how does the work flow." No matter how much AI accelerates the work inside a department, the approval queues, handoffs, and decision delays between departments stay put. Local productivity goes up while the org's end-to-end cycle time stays the same.

Each Department Gets Better. The Company Doesn't Win More.

In a functional org, each department has its own KPI. Marketing has MQL, sales has revenue, engineering has ship velocity, support has response time. When AI shows up, each department uses it to hit its own KPI better. But company-wide outcome isn't the sum of departmental efficiencies. BCG warns that if compensation, evaluation, and governance keep rewarding the legacy model, transformation stalls.

The bottleneck isn't the prompt. It's the approval structure. You can install tools. You can't install accountability.

Bottom-Up Diffusion Alone Doesn't Change the Structure

A developer can write the install guide, share skills, hook up MCP, become an internal evangelist. What a developer can't do: change the headcount policy, change the eval system, redraw the boundary between departments.

Uber is a good case in point. 84% of developers use agents, and 11% of PRs are opened directly by agents. By the numbers, this looks close to AX. But at the same time, AI-related costs grew 6x year over year, and at the CFO level the question of whether this connects to business impact is still open. Even cases that look like wins aren't airtight. Adoption was also slower than expected.

There's data on this gap between what people feel and what's measurable. In a randomized controlled trial by METR, experienced open-source developers were given AI tools and timed on tasks. They were actually 19% slower. After the task, they reported being 20% faster. When this happens at the org scale, the conviction "we innovated with AI" gets divorced from the measurement.

A startup founder told a story along the same lines. "I gave my CTO Cursor and a week-long task got done in a few hours. So I gave it to the whole team. Same result didn't happen." The productivity of one person who holds the codebase context in their head, and the productivity of the whole team, are different problems.

A developer can spread AI across the company. A developer can't redraw the company around AI.

4. What Has to Change for It to Be Transformation: Five Axes from the Cases

In section 3 we covered the structural limit of the functional org. So what did the companies that actually crossed that limit change? Not just "they use AI well." What axis of the org actually shifted? Let's read it from the cases.

A Company That Changed the Role: Shopify

The first axis of AX is role. If AI shows up and everyone is still doing the same thing, that's adoption, not transformation.

Shopify CEO Tobi Lütke set AI usage as a baseline expectation. To request a new headcount or a new resource, you first have to prove "why AI can't do it." AI usage shows up in performance reviews and peer reviews. It's not encouragement. It's policy. Roles that used to be production-by-job-function are being redefined as supervisor, reviewer, and judge, all assuming AI does the producing. A developer can evangelize AI all day and still can't change hiring criteria, evaluation criteria, or resource allocation. A CEO can.

In your org, after the AI rollout, did anyone's job description change? If not, the role axis hasn't moved.

Companies That Changed the Pipeline: Uber, DBS

The second axis is pipeline, the path the work travels.

Section 3 mentioned the limits of Uber's bottom-up adoption. The interesting part is that Uber recognized that limit and moved up to the platform level. They didn't just install Claude Code. They built an agentic system in four layers: an MCP gateway, a background agent platform called Minion, smart PR routing called Code Inbox, AI code review called uReview, automated test generation called Autocover, and large-scale migration management called Shepherd. The developer workflow itself shifted from "code in a single IDE" to "orchestrate multiple agents in parallel." The serial handoff of a functional org converted into a hybrid human-agent pipeline. That said, costs grew 6x, and the link to business impact is still an open problem. Changing the pipeline doesn't guarantee success.

Singapore's largest bank, DBS, went one step further. They completed nine operating-model transitions and rebuilt their human-AI collaboration workflows. DBS calls this "operating model transformation." Not just installing tools. Redesigning the path the work takes.

Companies That Changed KPIs and Governance, and Companies That Couldn't

The third and fourth axes are KPI and governance. They move together. If what you measure (KPI) and who has what authority (governance) don't change, no matter how much you change roles and pipelines, the org snaps back into shape.

BBVA scaled AI from 3,000 people to the full 120,000-person org, personally training 250 executives including the CEO and bringing security, legal, and compliance in from the start. Governance wasn't bolted on later. It was baked into the rollout design. Box also explicitly designed an executive sponsor + functional ownership + central build team + AI manager structure, embedding AI governance into the org structure from the beginning. J&J ran 900 AI use cases and confirmed that 10-15% of them generated 80% of the value, shifting weight from central governance to domain ownership. They moved from "let's try a lot of things" to "we have to choose what to use." The strategic focus moved from usage to impact.

Moderna defined itself as a "real-time AI organization" and hit 65% real-usage rates. The notable part is the public refusal of a model where business growth requires more headcount. CEO Stéphane Bancel said the company has to be able to run billions in revenue with thousands of people. That's a deliberate move from "headcount expansion" to "per-person efficiency" as the standard for growth.

On the other side, there are clear cases of companies that couldn't touch this axis and fell apart.

In 2024, Klarna trumpeted that an AI chatbot was doing the work of 700 people and emphasized headcount cuts. Then in 2025, the CEO admitted they "leaned too hard on cost cutting" and went back to hiring. When the KPI is bent purely toward "cost cutting," automation gets mistaken for the operating model. Costs went down, service quality went down with them, and they had to hire back.

McDonald's × IBM tried AI voice ordering and shut it down in 2024. Confusion, order errors, accent recognition problems. There were technical limits, but the deeper issue was that the operation on the ground wasn't ready to run that technology. When tech and operational readiness don't move together, pipeline transformation ends as an experiment.

Duolingo expanded to 148 courses fast with AI and posted real business results, but the "AI-first" declaration paired with messaging about replacing contractors triggered serious backlash. Even with results, emphasizing only headcount cuts without the role-shift context loses legitimacy inside and out. Transformation without change management is half-built.

The Fifth Axis: Resource Reallocation

The last axis is resources. The budget AX needs isn't the tool license fee. It's the cost of work redesign, change management, training, and operating-principle work. BCG sums it up: "real value goes to the small set of organizations that go past tool deployment and redesign how work flows."

Atlassian announced about a 10% workforce reduction (around 1,600 people) in March 2026, reinvesting in AI and enterprise sales while reorganizing around its System of Work. A clear case of "we're rewiring the org itself because of AI."

Putting It Together

The common changes across these cases line up into five axes.

Axis	Functional org	AX org
Role	Producer by job function	Supervisor, reviewer, judge, exception-handler
Pipeline	Serial handoff between functions	Hybrid human-agent pipeline, end-to-end single team
KPI	Output volume, utilization	End-to-end cycle time, decision latency, exception rate, customer outcome
Governance	Per-department approval and access	Central data access, model permissions, risk ownership, audit logs
Resources	Tool license budget	Work redesign, change management, training, operating principles

If even one of these five hasn't moved, you're still in the adoption phase. You can call it transformation only when all five move together.

5. What an AX Organization Actually Looks Like

We laid out five axes. But axes alone don't paint the picture. Let's sketch what a day looks different inside an org that actually went through AX.

A Project in a Functional Org vs. a Project in an AX Org

Picture a single new feature shipping in a functional org. The PM writes the spec. Hands it to design. The designer makes mockups, gets PM sign-off again. Hands it to engineering. Frontend first, backend next. QA tests, finds bugs, hands them back to engineering. After release, the data team runs the analysis. By the time results come back to the PM, a month has passed.

Across this whole process, the actual work time inside each team adds up to two days. The rest is waiting, handoffs, context switching, approval queues.

In an AX org, this flow is fundamentally different. A single mission-aligned team has product engineers and product designers in it together, with AI agents working as part of the pipeline. There's no separate PM writing a spec to hand off. The product engineer makes direction calls on top of AI's data analysis. The product designer makes fit calls on top of AI's drafts. AI writes the code, the engineer reviews. Tests are automated, and post-deploy analysis comes back inside the same team in real time.

Break down the existing PM role and it looks like this. Spec writing? AI does it. Inter-department coordination? There are no handoffs in a mission-aligned team, so this isn't needed. Data analysis and prioritization? AI proposes, humans decide. Decision-making? That stays. But the product engineer and product designer can do that themselves. One of the biggest reasons the PM existed was to act as the bridge between departments. When the boundaries between departments melt, that part of the role shrinks.

One thing not to misread: this isn't saying PMs disappear. PMs exist at companies like Shopify or Stripe too. The thing is, those PMs aren't inter-department coordinators. They're people who define customer problems and make calls on product direction. The "PM who writes specs and manages handoffs" in a functional org and the "PM who decides what to build and owns the result" in a mission-aligned team have the same title and completely different jobs. The former PM has a shrinking reason to exist in an AX org. The latter PM becomes more important.

The core difference is two things. First, handoffs disappear. The work flows inside a single team. Second, the human role shifts from production to judgment. Every team member operates as a Closer instead of a Maker. They don't write code; they judge code quality. They don't make designs; they judge design fit. They don't write specs; they decide product direction.

Past the Product Team: Marketing, Finance, and Support All Folded In

Let's push this one step further. There's no reason to confine this logic to the product team.

In a functional org, when a product launches, the marketing team plans the campaign, sales sells it, support handles inbound, and finance reconciles the revenue. Each has its own KPI and its own reporting line. To know how a feature played in the market, you have to gather data from all those teams, and that gathering itself creates more handoffs and more waiting.

In an AX org, those boundaries melt. Teams form around a single customer journey or a business mission. Inside that team you have product engineers and product designers, plus growth, customer experience, and revenue analysis. Different titles, same KPI.

Take a mission like "80% completion rate on new-user onboarding." Inside that team, AI analyzes user behavior data, growth designs experiments, the product engineer improves the onboarding flow, and customer experience compiles feedback from churned users. All of this happens inside one team, looking at the same dashboard, discussed in the same weekly meeting.

There's a reason AI makes this possible. It used to be that you had to hire a marketing specialist, a finance specialist, and a data analyst separately. Each specialist had to produce the deliverable in their own area. But when AI does the producing, one person can use AI to analyze marketing data, summarize customer feedback, and model revenue at the same time. The depth of expertise stays; the coverage widens.

DBS redefining ownership of the customer journey, mentioned earlier, is exactly this model. Each customer journey belongs to a single mission-aligned team, and that team owns it end-to-end, including product, marketing, customer experience, and revenue. Uber building four layers of agentic systems is the same idea. An individual using Claude Code and an org embedding agents into the pipeline are completely different categories of thing.

A functional org grouped people by expertise. An AX org groups people and agents by mission. That's the most fundamental difference.

The Subject of AX Is the Executive

Who can actually execute this transition? Not a developer.

What a developer can do: install, share skills, hook up MCP, evangelize internally. What a developer can't change: headcount policy, the eval system, ownership between departments, risk policy, budget allocation, KPIs.

Shopify had a CEO cross that line. Uber had a platform team design governance and the cost system centrally. BBVA trained 250 executives including the CEO before going company-wide. The common thread is clear. AX isn't a rollout campaign. It's a redesign of management. Unless an executive decides to redraw the org chart, no matter how good the tool, the functional org just stays a faster functional org.

Andrew Ng says the bottleneck of the AI era isn't coding. It's product management. The ability to decide what to build becomes scarcer than the ability to build. The people who can make that decision sit at the top of the org.

6. Maker and Closer: Individual Careers Have to Shift

We sketched the picture of an AX org. Mission-aligned teams, hybrid human-agent pipelines, end-to-end with no handoffs. So who survives in that org?

When the Org's Basic Unit Changes, So Does the Talent It Needs

The problem is that marketing, finance, and engineering are separated into different teams. They have to be reorganized into mission-aligned units. Every member of that team has to focus on hitting the same KPI.

Andrew Ng says the biggest bottleneck ends up being humans, specifically the people deciding the product. But in a traditional functional org, the bottleneck between organizations is creating much bigger delays and losses than the bottleneck between humans. Group people by mission, put one leader on the line for the entire end-to-end value flow. That's the shape closest to AX.

Reporting lines can be plural. Outcome ownership has to be singular.

The Uncomfortable Truth: Are Current People Right for the Future Org?

Rebuilding the org costs money. A lot of money. And here's a more uncomfortable truth: organizations are starting to ask whether the current people are the right fit for the new structure, and whether the current headcount is even necessary. Those questions are the real reason new hiring slowed down.

The reason hiring is cautious in the AI era isn't recession. It's that companies aren't sure of the future role structure.

Maker and Closer

Here's the distinction I want to propose.

Maker. Whether marketing, engineering, or finance, this is the person focused on producing output in their current work. They write specs, write code, create designs, draft reports.

Closer. This is the person who uses that output to actually hit a business KPI. They own the last mile from output to outcome.

Closer here isn't the sales sense of closing. The point is how far accountability extends. A Maker thinks "my piece is done" once the deliverable is in. A Closer goes further, evaluates whether the deliverable actually became an outcome, and owns the result. Output vs. Outcome. That's the difference.

Put it in behavior. A Closer is someone who can stop the work by saying "this direction is wrong." No matter how good the output, if they judge it isn't going to convert into customer outcome, they bend the direction. Calling a stop on production isn't something a Maker can do.

In professional careers, Makers tend to get more recognition. But the thing keeping the business alive is the Closer.

If output translated into outcome cleanly, great. But reality often doesn't work that way. You can stack up output and have all of it become useless if the goal is heading the wrong way. On the other side, there are Closers who hit the goal even with thinner output, by bending direction. A weak product that produces business results, a great product that lets the business die. These are common stories.

Five or six years ago I did tech consulting for early-stage startups and met a lot of organizations. There were companies where there wasn't a single proper engineer (forget CS majors, not even a bootcamp grad), where the founder taught themselves, wrote code in shapes that shouldn't have worked, and pulled in major follow-on investment and customer growth on top of that. That founder was a terrible Maker and a top-tier Closer. On the other side, I saw plenty of engineers with truly genius-level backgrounds who got absorbed in the craft and ignored what some of them call "the adult world." Perfect Makers. Not Closers.

Engineering organizations mostly leaned Maker. Designers and PMs (people who should have been Closers by nature) ended up doing Maker work too, inside shrunken authority and rigid org structures. The structure that wouldn't allow Closers couldn't produce Closers.

Are You a Maker or a Closer?

Look back at your last week. What did you produce directly? What did you make the final call on? If most of your time went into writing specs, writing code, and producing reports, you're a Maker. If you decided direction, judged priority, and owned outcomes, you're closer to a Closer.

This isn't saying Maker is bad. Until now, Makers were the engine of the org. But once AI starts doing the producing, the value of being a Maker shrinks. AI writes the code. AI makes the docs. AI runs the analysis. What's left is the ability to judge "what should we build" and "is this actually valuable to the customer."

Career Direction for the AX Era

As AX progresses, more of the Maker role gets delegated to AI. Writing code, drafting documents, making designs, analyzing data: the act of producing moves into AI's lane. In many cases the Closer role gets amplified, and inside a fully AI-native organization their decision-making becomes the only real bottleneck.

For most working people, the personal career direction in an AI-native org has to shift away from Maker and toward Closer. The people who pivot their careers that way are the ones who'll still be needed in the AI era. Deep expertise in two areas plus the ability to run an end-to-end cycle: what people call π-shaped talent. Because AI handles the producing, one person's coverage can widen, which is exactly what makes π-shaped talent realistically possible for the first time.

7. Korean Organizations and AX: Why It Hasn't Moved Yet

Reading this far, you might be thinking "okay, the overseas cases are nice, but Korea is different." That's true. Korea's situation is different. Different doesn't mean safe.

In Korea, even in the AI era, people don't lose jobs easily. Breaking convention and building a new org structure takes a long time.

But even when OpenClaw or Claude Code gets rolled out company-wide in a Korean firm, marketing is still marketing and engineering is still engineering. People say productivity exploded, but that's Maker productivity. Whether it shows up in a Closer's final outcome (revenue or business KPI) is a separate question.

Take the Korean SaaS startup case from section 2 again. The rollout was a success. Six months out, what does that org look like?

Engineering ships code faster with Claude Code. Marketing ships content faster with ChatGPT. Support categorizes inbound faster with AI. Everyone is faster. But the campaign-planning process between engineering and marketing is still a serial handoff: spec → approval → handoff → build. For customer feedback from support to make it into the product roadmap, it still has to move through PM → lead → sprint planning. AI came in. The way the work flows is exactly the same as before.

It's true each department got more efficient inside its own work. The walls between departments stayed put. Worse, with each department optimizing its own AI tools, you ended up with per-department AI islands. The high-speed silo-ification from section 3 is exactly this picture.

Why haven't Korean orgs started asking these questions yet? A few structural reasons.

First, the functional org has roots that go too deep. Most Korean tech companies have a marketing department, engineering department, and planning department as the spine of the org chart. Changing that spine requires the whole executive team to agree, and it's hard for executives who became executives inside a functional org to argue for dismantling the functional org.

Second, the performance measurement system is output-centric. Engineers are evaluated on number of releases, marketers on number of campaigns, PMs on number of specs. To shift this to "you'll now be evaluated on customer outcome," you have to rebuild the eval system itself.

Third, a culture that doesn't encourage learning and growth blocks change. Pivoting into a new role requires learning, but how many companies actively encourage and support that learning? Most orgs are already maxed out just keeping up with current work.

In an environment like that, most orgs slip into the easy path. If I were the person in charge, frankly, it'd be more comfortable to roll out OpenClaw and announce "we did AX" than to commit to rebuilding people and structure. Reality grinds you down.

But when newer companies grow with so-called AI-native structures from day one, can the existing companies match that speed? The fact that Korea hasn't yet felt large-scale AI-driven layoffs doesn't mean it's safe. The shock arrives later. That's all.

Not every org needs to change all five axes right now. A 10-person startup is already close to a mission-aligned team and has few handoffs. AX transformation is urgent for mid-sized and larger companies where the functional org has hardened. But even small orgs need to check one thing: is the AI tool they're laying down reinforcing the existing structure, or making a new structure possible?

In the end, AI will rewrite the way we work. The people who don't change inside an org that doesn't change just get overtaken by the people who do change inside an org that does.

8. To Close: Tools Make the Start. Structure Makes the Transformation.

Planting AI across a company is a good start. Real applause to the people who created that start.

But calling the start the finish makes the next step disappear.

Real AX isn't bolting AI onto an existing functional org. It's redefining roles, redesigning pipelines, changing KPIs, building governance, reallocating resources. Rewiring the organization into a hybrid human-agent operating system. That's why the subject of AX isn't a developer. It's an executive.

And individual careers have to shift from Maker to Closer. What AI eats is production itself. What stays with humans longer is outcome ownership and direction.

I was a Maker for a long time too. There was a stretch where I believed if you wrote good code, the business would follow. Now I know better. Good output is a necessary condition, not a sufficient one. Direction has to be right for output to mean anything. Setting the direction is the Closer's job.

Installing Claude Code across your org doesn't make it AX. AX starts the moment you redraw the org chart.

References

Bain & Company, "Unsticking Your AI Transformation"
McKinsey & Company, "AI in the Workplace: Empowering People to Unlock AI's Full Potential"
BCG, "Companies Must Go Beyond AI Adoption to Realize Its Full Potential"
BCG, "Five Things Boards Need to Get Right with AI"
Harvard Business Review, "Don't Let AI Reinforce Organizational Silos"
The Verge, "Shopify CEO says no new hires without proof AI can't do the job"
The Pragmatic Engineer, "How Uber Uses AI for Development"
Business Insider, "Andrew Ng: Product Management Is the New Bottleneck"
Computer Weekly, "DBS Rewires Operating Models for AI Reasoning Era"
Morgan Stanley, "Launch of AI @ Morgan Stanley Debrief"
Moderna, "Our Journey to Becoming a Real-Time AI Organization"
Reuters, "Sweden's Klarna shifts AI focus from cost cuts to growth"
AP News, "McDonald's ends test run of AI-powered drive-thrus with IBM"
Reuters, "Duolingo raises 2025 forecast"
Wall Street Journal, "Johnson & Johnson Pivots Its AI Strategy"
Atlassian, "Team Update March 2026"
OpenAI, "BBVA: Scaling AI Across 120,000 Employees"
Box, "AI-First: Building the Future of Intelligent Content Management"
METR, "Measuring the Impact of AI on Experienced Open-Source Developer Productivity"
r/cursor, "Do you believe the claims that AI isn't improving programmer productivity?"
r/ExperiencedDevs, "Did AI increase productivity in your company?"
r/ExperiencedDevs, "AI is working great for my team, and y'all are making me feel crazy"

Between a Working Feature and a Trustworthy Product: Building ToC Recognition

Tony Cho (https://flowkater.io) — Fri, 13 Mar 2026 07:00:00 GMT

Between a Working Feature and a Trustworthy Product: Building ToC Recognition

Opening

The biggest source of resistance in any mobile app eventually comes down to input. For utility apps especially (anything that isn't pure content consumption), input friction is the number-one cause of churn, regardless of category. A budgeting app needs spending entries. A productivity app needs schedules and to-dos. It's all input. Models have gotten dramatically smarter in the AI era, but the user-specific data still has to come from the user.

The thing is, a B2C mobile app has no way to know that user data on its own, and we're not Google or Meta sitting on a personal data trove. So even in the AI era, mobile apps still have to win on UX, and the problem of collecting user-specific data with the lowest possible friction and the highest possible accuracy is still wide open.

The moment you make users type in a table of contents, this feature is dead

I first tried to solve this problem five years ago. It was the second version of the service I'm currently rebuilding. (We're on version three now.) To give users a personalized study plan, we needed the table-of-contents data (ToC) of the book or workbook they were trying to read. Page count alone gave us a similar feature, but to take it further, the ToC mattered. Common sense tells you that asking someone to type in a book's ToC by hand, on mobile no less, is an instant exit ramp.

Pre-collecting ToC data (feat. QWEN)

The first thing I tried was building a pre-collected ToC database. Long before the work I'm describing in this post, I'd already poured hours into it. Unlike Korean book sites, most overseas book sites (especially English-language ones, which are our current primary market) just don't surface ToC data at all. (And Korean book APIs don't expose ToC either.) On top of that, since we aren't tied to any single publisher, there was no easy way to build a generic API-based scraper.

The second card I played was AI. That AI ended up being the protagonist of this post: Qwen 3.5 Flash. I built a broad ToC collection pipeline on top of it. (More on the model itself later.)

I poked around qwen.ai with a handful of reference books and saw that collection actually worked surprisingly well, so I went straight into building against the LLM API directly. I started with OpenRouter. The pricing was similar and it gave me model-portability, but the Qwen models on OpenRouter were just vanilla weights with none of the toolchain options. (Vanilla model results were brutal.) I migrated to Alibaba Cloud and re-wired the API through DashScope. DashScope had the full Qwen toolchain (web_search, web_extract), and on top of those tools I could build a pipeline that pulled ToC data with reasonable accuracy.

I built a pipeline where you'd input an ISBN13 and it would automatically collect both the metadata and the ToC. For something like a 7-volume bundle in the test set, pulling the entire ToC at once would blow past the token limit, so I split it: collect chapters at a coarse grain first, then run a second pass for sub-chapters. The test set wasn't huge, but a pipeline that accurately collected large amounts of ToC data across many genres was finally working.

I'd ended up with a collector that took an ISBN13 and spat out a depth-aware ToC as JSON. I'm describing it briefly here, but the LLM's inherent nondeterminism meant I needed multiple guardrails, and I think I pulled two or three all-nighters straight to get the pipeline standing.

But I never got to actually run real data collection through it. The biggest problem was cost. The testing alone burned more than $2,000. That's separate from Qwen 3.5 Flash's cheap token rate. The killer was the web_search tool-calling cost. Qwen's web_search tool-calling has no built-in compact step, so every byte that flows in through web_search counts straight against your token bill. I'd picked the model based on token pricing alone and never thought about toolchain or side-effect costs, so the bill caught me off guard.

You can't predict what books a user will request, so you have to collect as widely as possible, and to run this at production quality you also need a periodic verification pipeline. The cost could multiply many times over. The data set you need to cover is effectively infinite (yes, there's a long tail, but if the niche data isn't there, that user bounces immediately), and the collection cost is enormous. I kicked off a full verification test through Codex and went to bed. I woke up to a bill for the equivalent of three million won. I was wrecked.

I didn't quit there. I tried building custom skills inside subscription-based Codex and Claude Code, but maybe because they aren't API-mode models, the results were poor despite the much stronger underlying model performance. The client-side skills and plugins (Playwright and friends) couldn't keep up with Qwen's native toolchain on web_search. When I bolted a Playwright skill onto the GPT-5.3-Codex-Spark model, Chrome devoured all the memory and my M4 Max maxed-out MacBook locked up for the first time ever.

This wasn't a pure technical failure. It was the first lesson that no technology becomes a product if you don't think about operating cost and data coverage at the same time. Three days of flailing, $2,000 in cost (billed during a weak-won stretch, which made it 3 million won), plus the fact that even with a pre-built database, niche user-specific ToCs would still be a blind spot. All of that stacked up, and I shut the project down.

Letting users do the recognition themselves

In the end, the user inputs it

After paying a fairly steep tuition, I landed back at square one. "The user inputs it." Second-best, but not a bad call. Obviously asking users to type in every line of the ToC is absurd. OCR has gotten a lot better, so the user can just take a photo. As a bonus, that data becomes their own.

The problem wasn't text input. It was structure input. A ToC isn't a flat text list, it's a graph of nodes with depth levels. Snapping a photo doesn't get iOS VisionKit to recognize the structure. Compared to older OCR models, raw text recognition was strong, and it could even handle moderately structured documents. But that "moderately" produced the worst possible experience for the user.

Why this didn't work before LLMs

Like I said, this wasn't my first attempt. Five years ago I tried OCR + normalization in a handful of ways. The OCR libraries and services back then could already pull a flat list. But what I actually needed was:

Part / Chapter / Section (and possibly more depth)
depth
page
hierarchy

What I needed wasn't a slab of text but a hierarchy.

Back then I also tried using language models to build a classification system. Compared to today, we were in the very early days of language modeling. Transformers and attention had just started showing up in real products, so I tried building a language model on GCP's ML platform by collecting as much ToC data as I could. The idea was to feed a flat-list ToC into the language model so it could learn each item's distinctive pattern, then return a structure given a flat text list. But once the text was already converted to a flat list, the model had to infer the hierarchy from scratch, and between case diversity and lack of training data, it couldn't solve the problem at all.

Line breaks, indentation, numbering schemes, mixed Roman/Arabic numerals (every structural signal you'd want) all got flattened the moment OCR touched them. The extra requirement of matching page numbers, I never even attempted.

The hard part of ToC recognition wasn't OCR text accuracy. It was structuring. Not reading characters but reading the hierarchy between characters. And solving it solo back then was an extremely inefficient use of time. The ROI just wasn't there.

Why it works now

Major models like GPT, Claude, and Gemini have gotten enormously better. The reason it's still hard to ship a real AI-powered service is API cost. I subscribe to the $200 GPT Pro plan, but if I'd been paying for the same Codex usage at API metered rates, I'd be staring at a bill in the thousands of dollars.

Most people overlook this. Because we're always running on SOTA models, people say things like "agentic engineering matters, prompt engineering doesn't." But if you're wiring an LLM API into a real product and need the unit economics to work, you cannot use a SOTA model. You're stuck with API models from one or two generations back. All of this comes down to cost.

On February 23, 2026, Alibaba Cloud announced Qwen 3.5. As part of that release they shipped Qwen 3.5 Flash. The Pro model is usually compared to Claude Sonnet, and Flash sits well below Pro. Multi-turn performance falls off a cliff, and like older-generation models, it answers single requests well. But on top of the web_search and web_extract tools I mentioned earlier, Vision was also baked in. For simple tasks, it's blazingly fast and accurate, with API cost that's overwhelmingly cheaper than the competition. (Chinese-origin models raise privacy concerns, but Qwen also ships local open-source weights separately.)

Why Qwen 3.5 Flash

Below is a comparison of each vendor's lightweight (Flash/mini/nano) line. Same-tier comparison feels fair. (Official pricing as of February 2026.)

	Claude Haiku 4.5	GPT-5-nano	Gemini 3 Flash	Qwen 3.5 Flash
Tier	Lightweight (Haiku)	Lightweight (nano)	Lightweight (Flash)	Lightweight (Flash)
Input (per 1M tokens)	$1.00	$0.05	$0.50	$0.10
Output (per 1M tokens)	$5.00	$0.40	$3.00	$0.40
Vision (image recognition)	Excellent	Excellent	Excellent	Sufficient
Structured JSON output	Excellent	Good	Excellent	Sufficient (single-request)
Speed	Fast	Very fast	Fast	Very fast
Multi-turn performance	Good	Average	Good	Drops off sharply

Looking at the table alone, GPT-5-nano has a unit-price edge over Qwen on Input. But real operating cost doesn't come down to Input pricing alone. If structured JSON output quality is poor, you accumulate retries, post-processing, and fallback calls to other models, and that piles up as hidden cost. For this task, GPT-5-nano's structured output was only "Good," while Qwen 3.5 Flash delivered structured output stable enough to pass actual production tests on a single-request basis. Since the core pattern wasn't a complex multi-turn dialogue but rather "send one image, receive one structured JSON," that gap was decisive.

The ToC recognition workflow doesn't demand multi-turn either. The user takes a photo of a book or workbook page, and the system needs to receive that image once and produce a JSON ToC structure. In this scenario, the reliability of a single "vision + structured output" shot matters far more than multi-turn reasoning. Qwen 3.5 Flash gave us results that were more than satisfactory on this single-request + structured-output combo against same-tier models, and that became the core argument for choosing it. (This doesn't mean Qwen Flash is the best at every task, just that it was a strong fit for this specific one.)

Speed matters too. When a user takes a camera shot of a page and waits for the result, response time is UX. With the same image, Haiku 4.5 took roughly 10 to 15 seconds, while Qwen 3.5 Flash returned in 5 to 8 seconds. Nearly twice as fast in feel. On cost, Qwen Flash also sits comfortably on the cheap end of the lightweight tier. Measured against this task's requirements (lightweight, cheap, fast, single-request structured output), it was harder to find a reason not to use Flash.

The remaining worry was OCR/vision quality. Honestly, I was skeptical at first that a Flash-tier vision model could handle real book photos with uneven lighting, page curvature, and tiny print. The actual tests showed that text recognition itself was practical. The harder question wasn't recognition rate but "how do you structure and emit it," and that part was prompt design and post-processing territory. When the model gives you 80, you fill in the remaining 20 with engineering. (That sentence is the thesis of this whole post.)

Why DashScope

I started by wiring it through OpenRouter. Same model, brutal results. Turns out it was a completely different beast from running DashScope native. With the vanilla model, the same prompt produced unusable output, but on DashScope, web_search, web_extract, and Vision were all attached as native toolchain. The fact that the same model could feel that different across platforms was a shock. It had been the deciding factor for the collection pipeline, and it was the same story for the recognition pipeline.

It ran reliably, and the cost was predictable. DashScope has clear region-by-region pricing and a free quota. The scariest thing about wiring an LLM API into a commercial service is "I don't know what this month's bill will be." DashScope had less of that uncertainty. The Singapore region has the latest models available immediately, so I knew what each call would cost.

Stability was the other reason. Alibaba Cloud is an infrastructure company and DashScope is a service layer on top of it, so at minimum I could worry less about the API just disappearing. Add an extra proxy layer and you add an extra failure point. I'd already lived through one OpenRouter to DashScope migration during the collection phase, so this time I went straight to DashScope.

If you're considering an LLM API integration, it's worth testing this. Beyond the US-origin SOTA models, China-origin models like Kimi, Qwen, and GLM are worth a look. The fact that the same model can produce completely different results depending on the platform you run it on is something you only learn by experiencing it firsthand.

From a working feature to a trustworthy product (build log)

The rest is the actual development flow. You don't need to follow every technical detail. But I hope you walk away with a feel for why this had to be this complicated. What I really want to convey isn't the technical detail itself, it's how far the distance is from "it works" to "you can trust it."

Do Work: wired up the sync API first

At least the parsing works.

I didn't build the right structure from day one. I built a working version first.

I started with the simplest possible flow. iOS takes a photo and uploads the image to the server. The server sends the image to the DashScope API, gets the JSON back synchronously, and pipes it down to iOS. The prompt included bookTitle and totalPages as hints. Telling the model the book title gives it more context, and telling it the total page count makes page-number inference more accurate. A small hint like that turned out to make a surprisingly large difference in result quality.

I still remember how the first test felt. I sent in a single photo and looked at the JSON that came back, and the Chapter, Section, depth, and startPage were all picked up pretty cleanly. The "wait, this actually works?" moment. After burning $2,000 on the pre-collection pipeline, that moment was honestly emotional.

Multi-image mattered more than I expected

At first I assumed "one photo and we're done." But when you actually open a real book's ToC, more often than not it doesn't fit on one page. Technical books and workbooks especially can spread their ToC across 5 to 6 pages (one book in the test set was 10 pages). Multi-image parsing wasn't optional. Without it, the feature was unusable.

The catch is that merging the parse results from multiple images isn't a simple concat.

First, you have to preserve image order. There's no guarantee the user shot page 1 first.

Second, you need chapter merging. The last chapter of image A and the first chapter of image B might be the same chapter. Chapter 3 might start at the end of the first photo and continue with sub-sections in the second. Treat that as a duplicate and you get two chapters. Ignore it and you lose the sections.

Third, dedupe and startPage-based sorting. The same chapter can appear in multiple images, and page numbers can overlap.

Fourth, warning handling. If the model judges that an uploaded image isn't actually a ToC, it has to return not_toc. If chapter count is abnormally low, it has to surface too_few_chapters. If it had to force-adjust page order, it should send page_order_adjusted. Without those warnings, a quietly returned result means the user ends up using bad data.

The moment multi-image entered the picture, prompt design, merge logic, dedupe rules, sorting algorithms, and the warning system all jumped a level in complexity. It didn't take long to realize how naive "one photo and we're done" had been.

It worked, but it wasn't a product

The feature ran. Send three images, get a merged ToC JSON back. Accuracy was decent. But there was a fatal problem. Three images meant tens of seconds before the model responded. During that time, the HTTP connection stayed open, a server worker stayed pinned, and the user stared at a blank screen.

Worse was the timeout. On mobile, holding an HTTP connection for over 30 seconds got you dropped depending on network conditions. Dropped meant starting from scratch. The model invocation cost was already spent and the result was gone.

This is a feature, not a product. At a demo people might say "oh wow," but ship it to real users and they'll use it once and never again.

Good: splitting it into an async parse-job, finally felt product-shaped

It becomes a feature you can wait on.

Sync requests were stressful for both server and user. I couldn't shrink model latency. So I had to change how you wait.

Evolving into a job system

I switched to a parse-jobs async model. The flow:

Client uploads images and requests parsing
Server immediately creates a job and returns a jobId (under a second up to here)
Actual parsing runs on a background worker
Client polls status periodically using the jobId

That switch alone changed the user experience drastically. Sending the request now returns "accepted" instantly, and the app can render a "processing" UI. The feature evolved from "model invocation" into a "job system."

Deduplication, not just async-ification

Going async wasn't the only change. If a job was already in flight or completed for the same image set, I reused the existing job instead of creating a new one.

The user accidentally sends the same photo twice → 2x model cost
Network issues cause the app to retry → two duplicate jobs
The user wants to see "that result from earlier" → just look up the existing one and return it

Whether the result was new or reused, the client got 200 either way, with the reused flag spelled out. The client only needs to know "I got 200, so I can take the jobId and start querying status." (Debugging needs the flag, so we keep it visible there.)

This is a decision tied directly to cost. LLM API calls aren't free. Run this without dedup and your costs become unpredictable.

Failure contracts: being good only on the happy path isn't enough

On the iOS side, I didn't stop at "show it on success." The failure contract had to be explicit.

If we don't get HTTP 200, fall back to the local path. The app has to keep working even when the server is down. Showing the user a "server error" message is the worst option. Better to offer "automatic recognition failed, please input it manually" as an alternative. (Sure, manual input is the worst UX, but it beats showing a raw error message.)

We were lucky to have Apple VisionKit sitting on-device as a local MLKit, so even the fallback was a notch better than pure manual input. (It can't structure things, of course.)

A backend isn't a service that's "fine when things are fine." You have to agree on how the client reacts when things fail too. That isn't an API spec. It's a product contract.

Good, but still not enough

If Do Work was "a feature that runs," Good was the stage where it became "a feature you can wait on." But the user still saw nothing during that wait, with no idea when it would end. Polling that only said "still processing" wasn't an experience worth shipping in 2026.

I could have stopped at "this is good enough." Plenty of services do stop here. But what do you do with the time the user spends waiting? That, to me, is what separates Good from Great.

Great: real-time experience, load distribution, operational risk all baked in, and finally production

It becomes a product experience you can trust.

Adding SSE turned the feature into an experience

Polling had clear limits. The user has no idea "where we are right now." Tight polling intervals raise server load. Loose intervals raise perceived delay.

So I introduced SSE (Server-Sent Events). The client opens a connection and the server pushes events in real time.

snapshot: full current state at connect time
status: job state changes (queued → processing → completed/failed)
preview: progressively recognized ToC structure as parsing runs
heartbeat: signal that the connection is alive
usage: token-usage and other meta info
completed: final result confirmed
failed: failure and error info

The biggest UX win was preview. It shows the model recognizing the ToC in real time. Chapter 1 appears first, Section 1.1 nests underneath, the next Chapter is added. Like ChatGPT streaming an answer character by character, the ToC progressively assembles itself in front of you. The moment that landed, the "feature" became an "experience." The wait shifted from boring dead time into something closer to anticipation.

At first this looked simple enough that "just show it as it comes out" felt sufficient. But once I actually built it, the hard part started right after.

preview and truth had to be separated

Showing things in real time and storing things you can trust are two different problems.

I learned this when I tried using preview directly as the final result and things blew up. preview has no nodeId and no order. Enough to render in the UI, sure, but downstream (plan generation, checkItems conversion, user customization) needs metadata that preview doesn't carry.

So I set the rules:

preview is for UI. Non-authoritative.
Only the final result is truth. Stored in the DB, accessed only via the GET parse-jobs/{jobId} read path.
preview and the worker queue stream are separated.

Treat preview as truth and you can generate plans from incomplete, mid-parse data. Don't separate them, and what happens? You showed the user "Chapter 3 recognized," but actually there were 5 chapters, and now their plan is incomplete. Display and storage cannot share the same channel. Obvious in hindsight, easy to miss in practice.

The hard part wasn't SSE, it was the preview emit policy

Wiring up SSE itself isn't hard. The hard part comes after. How often, and on what kind of change, do you emit a preview?

Every preview emission costs you twice: storing it in the latest preview cache, and appending it to the event stream. Emit a lot and the UX is flashy but server writes explode. Emit too little and there's no point in having SSE at all. If the screen is empty for a while and then the whole result suddenly appears, how is that any different from polling?

Mobile environments make it worse. Connection drops on the subway, drops switching from Wi-Fi to LTE. Every reconnect makes the server query the latest preview and replay the prior events. The more reconnects pile up, the heavier the server-read load. Real-time delay might not even be the bigger problem.

So at the PreviewAssembler level, I built in an adaptive emit policy:

throttle: minimum emit interval. Too often and the server suffers.
chapterThrottle: emit only on chapter-level changes. A single new section gets held.
maxSilence: if nothing is sent for too long, send a preview as a heartbeat substitute. Keeps the user from worrying that things have stalled.
fingerprint comparison: hash against the previous preview and emit only when there's an actual change. The model sometimes repeats the same content.
pending preview hold: if emit conditions aren't met, hold and send only the latest at the next emit point.

The strategy is two-tier. Incremental strategy keeps things cheap by default, and when it breaks, whole replay fallback recovers safely. As long as incremental processing holds, we save cost; once state gets tangled, we resend the whole thing to guarantee consistency.

What you see in real time looked flashy, but the actual hard problem was making it look better while sending less. SSE quality wasn't about whether the connection was up. It was about how carefully you let preview flow through.

Real-time UX forced server design to start over

SSE isn't a pretty technology. It's a different way of managing wait and load. Adding SSE meant redesigning a substantial chunk of the server architecture.

Separating worker queue from SSE event stream. At first I tried to reuse the existing worker queue's Redis Stream as the SSE source. But that couples the worker's processing unit to SSE's emit unit. If the worker speeds up, SSE over-emits. If the worker slows down, SSE feels sluggish. These two need to be independent. I split the Redis event stream onto a separate key.

Replay, reconnect, Last-Event-ID. On mobile, reconnects aren't an exception, they're normal. When reconnecting, sending Last-Event-ID lets the server replay only events after that point. Without it, every drop sends the user back to the start. Same problem we hit in Do Work, repeating itself at the SSE layer.

Nginx config. proxy_buffering off, X-Accel-Buffering: no. Skip these and Nginx quietly buffers SSE events and ships them in one batch. You built a "real-time" thing and got a delayed batch. Miss this and SSE stops meaning anything.

Graceful shutdown and deploy. I set shutdown timeout to 30 seconds and used blue/green for cutovers. The deploy goal wasn't "zero downtime," it was a system that can reconnect. Perfect zero-downtime costs too much. A system that recovers when it drops is enough. Even on disconnect, the client auto-reconnects, and the snapshot-first + replay + live tail structure restores prior state.

Full SSE session flow. When a session opens: query the job → query the latest preview → assemble the snapshot event → replay → live tail. Those five steps repeat on every connection. UX got better, but as reconnects pile up, that initial handshake cost accumulates. Recent optimization focus has shifted from "add more SSE" to controlling preview emit frequency and reconnect cost. Real-time UX isn't free. Behind the flashy display is a cost the server has to absorb.

Real quality gains came from the 30-book test set, not from swapping models

The system was complete. But to push it to production grade, "it works" wasn't enough.

I ran 30 real books through it. Different fields (CS, math, literature, business), different publishers (O'Reilly, Pearson, Korean publishers), different layouts. Failure patterns started to surface.

Missing chapters: model skips entire chapters → prompt reinforcement + missing detection in merge-normalize
Roman numeral misreads: confuses Part Ⅳ with Part IV → numbering normalization rules patched
Depth misjudgment: promotes a Section to a Chapter or demotes one → depth correction logic added in parser/post-process
Page order drift: image order doesn't match actual page order → startPage-based sorting reinforced
Line break / indentation confusion: OCR reads line breaks as structural separators → spelled out in prompt
Debugging difficulty: same image, nondeterministic results → live integration tests + detailed logging

Accuracy didn't go up just because I swapped models. Quality went up while building the test set and blocking regressions. Build a canonical fixture, run the batch integration tests, reproduce failed cases as live integration tests, edit the prompt builder, run the whole thing again. I repeated that loop dozens of times.

There's a commit message that just says "dev environment debugging support," and that one was central. You can't raise quality unless you can reproduce real-world failures with real data. A bug you can't reproduce is a bug you can't fix. That part isn't the model's job. It's the engineer's.

Tests were green, but production handed me a release blocker

We passed the 30-book test set and the SSE flow was stable. Tests were all green. I was ready to ship.

Then the final review surfaced release blockers.

Permanent wait on terminal event loss (critical): if completed or failed gets dropped, the client stays "processing" forever. The user has to force-quit the app.
Missing event timestamp/message: replay order can scramble.
Redis MAXLEN unset: events accumulate without bound. Memory creeps up until one day Redis dies.
No retry on worker failure: if the DashScope API fails, the job stays "processing" forever.
preview partial JSON repair vulnerability: incomplete JSON could be sent down as is.

Any one of these going off in production seriously breaks the user experience. Tests verify "does the happy path work." Production demands "are the unhappy paths also safe." After fixing every release blocker, writing the manual runbook, building the deploy checklist, and validating the actual SSE flow with a live integration transcript, only then did "okay, ready to ship" feel earned.

The last piece of the pipeline: hierarchy selection UX

You'd think we were done at this point. There was one more piece.

No matter how accurately the model extracts the ToC structure, you can't just hand the raw result to the user. Different users need different depths. Some only want the Chapter level, others want it down to Section.

So we needed hierarchy selection. Default behavior:

Show leaf nodes (the deepest items) selected by default
Toggling a parent node selects/deselects everything underneath
The default state should already match "what most cases want"

Photo to structured JSON to hierarchy selection to Items generation

The whole thing is a single pipeline. The model is good at photo to JSON. Everything else is engineering.

What is product engineering in the AI era?

As you've probably gathered, this isn't simply an OCR feature. It's an engineered flow: photo to structured JSON to hierarchy selection to Items generation.

On the surface, sure, this work could look like nothing more than wiring up an LLM API and slapping SSE on top for streaming UX. You might wonder, "isn't everyone doing this these days?" What I want to show through this process is the work of taking a slow, nondeterministic, expensive LLM call and turning it into a product experience the user can wait on, can understand, and can trust.

What got solved here:

The UX problem of reducing mobile input friction
The recognition problem (it's structuring accuracy, not OCR)
The job-system problem of fitting slow LLM responses into a product flow
The real-time UX problem covering polling, SSE, and reconnect
The data trust problem of separating preview from truth
Operational problems like emit frequency, Redis writes, and reconnect cost
The product problem of fallback on failure and the user contract

A product engineer's persistence isn't proven by reaching for big-name technologies. It shows up in chasing one small-looking feature all the way down until the user feels no friction.

ToC recognition is a tiny part of the product I'm currently building. It's not even a required feature. In an earlier era I might have cut it from the early roadmap. (And it would have lived in the backlog forever.) If someone had told me they'd build it to this depth back then, I'd have argued them out of it. The cost wouldn't pencil out. But in today's AI coding agent era, this is something you can build in a day, or even a few hours. At that price, it's worth paying.

So what is product engineering in the AI era? The core is going beyond Do Work, through Good, all the way down into Great. Not stopping at "this is good enough" (Good), but pushing into the maximum of detail (Great). That, I think, is what AI-era product engineering is.

No matter how well you build the harness, no matter how strong the agent gets, the one thinking and deciding is still the engineer. I built this mostly with Codex, leaning hard on the Superpowers skill (Codex's autonomous-execution mode). But the implementation flow itself, requirements analysis, implementation, optimization, was something I kept digging into and deciding on personally.

Just because I'm not writing the code by hand doesn't mean Craftsmanship disappears. The willingness to step back when "I think we're done" comes up, re-check everything from the start, re-test, and push the last 2%. That's what's needed.

Closing

When you first learn to code, it's genuinely thrilling. The people getting in through vibe coding right now probably feel the same way. I'll never forget the feeling of building the first service that did exactly what I'd intended. (Back in 2011 I built my own web Dropbox clone in Ruby on Rails. The pride.)

But there's an enormous gap between a working feature and a product you can trust. AI model integration, the topic of this post, is especially seductive at the start. You test the model in the vendor's playground, see real performance, and feel "I could build anything with this." You're ecstatic. Then you crash. Between LLM API costs and the realities of running things in production, the work I assumed would be "just plug in the API" turned out to need 80% more thought and trial and error.

I don't think the product I built is Great. But when the user tells you "this is awkward" or "this isn't intuitive," whether you ignore that point or hold onto it until the end, that, I think, is the fork in the road to Great.

In the era when development cost was high, "over-engineering" became an excuse to dismiss obsession with detail. But if you can build that detail in a day instead of three weeks, why give it up? If you have ten possible features and only ship three, but those three give the user an outstanding product experience, I'd argue that's the better direction.

Even in the era where AI writes the code, when you look at the engineers building good products, they're still pulling all-nighters. iOS development is hard for me. After years of Flutter, I picked iOS, and no matter how hard I beat on Codex, UI transition code can spiral once it goes off the rails. Infinite loop time. After spending a full day failing to fix a trivial issue, that's when I finally regret skipping fundamentals, start adding logs, and search for similar demos. Like the old days. When AI can't solve it, I have to find it. Like staying up all night thinking through preview emit policy.

I'm not the only one. The OpenClaw developer built and threw away 43 projects in 60 days before one hit. Look at the commit history from those 60 days and you'll see 3 a.m. and 4 a.m. commits everywhere. Even in the era when AI writes the code, the engineers building real products are the ones writing "broken at least once, and eventually fixed at 1 AM while questioning my life choices." The Zed editor team wrote a piece last year called "The Case for Software Craftsmanship in the Era of Vibes", declaring that craftsmanship still matters in the vibe coding era. There's even an Artisanal Coding(職人コーディング) manifesto out now. Summoning Japanese-style craftsmanship in this era. A bit much, maybe, and I still feel it.

AI coding agents reportedly raise productivity by 30 to 60 percent. True. I feel it too. But where you spend the time saved is the fork in the road. Stamp out more features, or push one feature deeper. I picked the latter, and I still think that's right. (Of course, revenue has to prove that out, and that part I don't know yet.)

Other people ship 10 apps a month, and I'm three months into building one product, having cut a substantial portion of the originally planned features. Even with a coding agent sitting next to you, engineering is craftsmanship. When AI returns a passable 80-point average, the remaining 20 points are on you.

References

9 Survival Skills for the Agentic Engineering Era: the 9 abilities Karpathy says engineers need in the agentic engineering era
Give Claude Code Wings: Introducing Superpowers: how to install Codex's autonomous-execution mode Superpowers and the 7-step workflow
How a 15-Year CTO Vibe-Codes: pair-programming with AI based on Kent Beck's augmented coding philosophy
AI Is Only as Smart as You Are: why two people using the same AI get different gaps. AI's output is decided by your input
AI Agent Jarvis: Becoming My Second Brain: real-world notes on building a 24/7 AI agent with the OpenClaw framework
The Case for Software Craftsmanship in the Era of Vibes: software craftsmanship declaration from the Zed editor team (2025.06)
Artisanal Coding(職人コーディング): A Manifesto: a manifesto for software craftsmanship in the AI era (2025.10)
OpenClaw: One Developer, 43 Failed Projects: Peter Steinberger's story of building and throwing away 43 projects before OpenClaw
Introducing Qwen 3.5: Alibaba Cloud's official Qwen 3.5 Flash model announcement (2026.02.23)
DashScope Model Studio: AlibabaCloud DashScope API documentation and pricing

Code Review in the AI Era: How Should We Do It?

Tony Cho (https://flowkater.io) — Sun, 08 Mar 2026 03:00:00 GMT

Intro

When I worked as CTO, one of the first "official duties" to disappear from my plate was code review. Before the team got big enough, I ran code review for the backend team. But once I was CTO of a 20-plus-person org, things shifted. The positional title of R&D Department Head started to outweigh the functional title of CTO, and my management moved from code to people.

That doesn't mean I stopped doing code review entirely after that. The data engineering team was small and had no lead, so for over a year I personally ran their code reviews and study sessions. With newly hired juniors, regardless of their position, I sat down for study and review with them. It was the kind of thing senior engineers, buried in their own workload, couldn't easily pick up. Sometimes I'd spend several weeks doing 1-on-1 reviews and feedback with a junior who was struggling to write code. Since none of this counted as "official work," it was hard to fit any of it into my actual working hours.

Honestly, even calling code review an "official duty" is a stretch. We never really nailed down the process for how to do it, and that part is still hard today. Back then I tried to build a culture where developers rotated through reviewing each other's code. Some people couldn't see why they should review code that wasn't their own. Others were enthusiastic to a fault, but reviewed so aggressively that, despite all their energy, they made the room flinch.

Review styles were all over the place. For backend (and the same went for data and frontend), I focused on the application layer: code architecture, whether the code was sufficiently human-readable, whether it broke our internal conventions. I gave feedback on better ways to write the code and offered guidelines around unnecessary duplication or design issues with real tradeoffs. But other team leads went further: they pulled the branch locally, ran it, and reviewed the final quality end to end. Most of that was on the client side. When you have time, the latter is usually the better review. But it costs so much time that as workload grows, doing it properly becomes impossible.

The problem, in the end, was time. I've seen plenty of companies with great review processes, but in a tech startup where survival means a deadline every single day, we'd often code in the morning and ship in the afternoon. Skilled developers who were both willing and capable of reviewing code (and kind enough to do it) were a small group. Eventually skipped PRs got merged for the unavoidable reason of "schedule," and between juniors who hadn't built up a strong mental model of code architecture and seniors who'd handed their code quality over to the monster called scheduling, the codebase itself slowly became a monster too.

The problem, in the end, was people.

We've now entered the era of AI agentic engineering, or as I called it earlier, the era of not reading code, and a wave of tools has shown up to solve the problems of the previous era. It started with CodeRabbit, then Codex and Claude Code began posting PR messages and line-by-line comments. In other words, code review without human dependency became possible.

The catch is that the PRs going up are also written without human dependency.

There's a ghost story you hear sometimes: a junior who doesn't really know what they're doing pushes AI-generated code, and a senior pulls an all-nighter touching every line to fix it. I can't verify the story, but it shows up often enough. It sounds like distrust of AI, but really it's distrust of juniors who don't know what they're using. A sorcerer's apprentice who can normally cast a fireball suddenly gets an infinite mana source (AI), starts throwing hellfire around with no experience or magical knowledge, and falls into a kind of magical possession(?).

Either way, this conversation keeps coming up because in any organization where individuals are personally responsible for their code, even AI-produced code carries that same personal responsibility. So plenty of teams are losing sleep over this. Individual productivity has been amplified massively by AI, but team-level or company-level output hasn't visibly jumped in many cases. Part of the reason might be that team collaboration is, at the bottom, a question of accountability.

There's an interesting data point. In a study of more than 10,000 developers, teams with high AI adoption merged 98% more PRs but spent 91% more time on review. The individual got faster; the team got slower. The code review bottleneck didn't disappear. It just got bigger.

So one question remains. In the AI era, how should we do code review?

We Still Don't Trust Code That No Human Has Reviewed

Let's start with the most intuitive position. "No matter how good AI gets, we can't trust code that a human hasn't read."

Simon Willison recently put together a list of agentic engineering anti-patterns, and the very first one was exactly this. "Don't dump unreviewed code on your collaborators." Just because an agent generated hundreds or thousands of lines of code for you doesn't mean you should send it up as a PR. That's offloading the actual work onto your teammates. In his words: "They could just prompt an agent themselves. So what value are you actually adding?"

That's a sharp question. And it lands for me. The most frustrating moment back in my CTO days was when a PR came up and the person who pushed it couldn't explain their own code. The same is true in the AI era. Actually, it's worse in the AI era, because the agent will write a plausible-looking PR description for you. If someone pushes a PR with an agent-written description they haven't even read themselves (and I sometimes feel that temptation too), that's just rude to the reviewer.

Kent Beck, one of the engineers I respect most, takes a similar stance. I introduced his Augmented Coding philosophy earlier in How a 15-year CTO does vibe coding, and the core idea is the same. The faster AI generates code, the more important testing and review become — not less. As the cost of generation approaches zero, the source of value shifts from generation to verification.

Addy Osmani put his finger on the same point. "The unsolved problem isn't generation, it's verification. That's where engineering judgment becomes your highest-leverage skill." AI is good at making code. Whether that code is correct, whether it fits your system, whether it's still maintainable in six months: that judgment is still on people. At least for now.

The core of this position is clear. No matter how well AI generates code, responsibility for that code lands on a human in the end. If you're responsible, you have to verify. If you verify, you have to review. Logically airtight.

That said, there's an uncomfortable truth here. Are there enough people with the time and skill to do that "review"? The way I lived it as CTO, code review getting pushed out of "official work" wasn't a question of willpower. It was a question of reality. If AI tripled code output and review capacity stayed flat, this position is correct, but whether it's actually sustainable is another question.

The Era of Humans Reviewing Code Is Over

On the other side, there's a much more aggressive claim. "The era of humans reviewing code is over. Or rather, it has to be over."

Bryan Finster recently applied the Nyquist-Shannon Sampling Theorem to this problem, and the analogy is more persuasive than it sounds. The original is a communications theorem: to accurately reconstruct a signal, you have to sample at more than twice the highest frequency present. Apply that to software and you get this. If your defect-detection rate can't keep up with your code-production rate, you don't miss problems occasionally. You miss them systematically.

AI produces code at a high frequency. Manual code review is a low-frequency sampling mechanism. We've raised the production frequency without raising the feedback frequency. That's the definition of undersampling, and undersampling means you miss things. Not occasionally. Reliably.

The data from SmartBear's analysis of Cisco Systems teams backs this up. Human reviewer defect-detection rates fall off a cliff once you cross 400 lines. Yet a single AI prompt can spit out 600 lines. PRs over 400 lines aren't reviews. They're rubber stamps. This matches my CTO experience exactly. Under deadline pressure, PR reviews became a formality, and even strong developers fell into "skim it, LGTM" mode. The AI era only made it worse.

A company called StrongDM pushed this logic to the extreme. In their "Software Factory," humans don't write code, and humans don't review code. What humans do is define intent, curate scenarios, and set constraints. After that, agents do everything: they generate code, validate scenarios in a behavior-clone environment from a third-party service called Digital Twin Universe, and iterate until the scenarios pass. Validation has replaced code review.

When I first saw this, I'll admit my reaction was "does this actually work?" But Simon Willison watched this team's demo and wrote it up on his blog, and Wharton's Ethan Mollick and Y Combinator's Garry Tan both took notice. Stanford Law School's CodeX even published an analysis titled "Built by Agents, Tested by Agents, Trusted by Whom?" The title is direct. If agents build it and agents test it, who can trust it? When the same kind of AI writes the code and the same kind of AI tests it, both can miss the same thing. And when this software blows up in production, with no human author, who carries the responsibility? Nobody has answered that question yet. StrongDM is using this approach in production anyway — and they're a security infrastructure company. That's why this experiment is hard to dismiss.

If StrongDM is the extreme, Salesforce went for a realistic middle ground. After adopting AI-assisted coding, code volume rose roughly 30%, and PRs touching 20 files and 1,000 changed lines became routine. More worryingly, review time on the largest PRs actually started dropping. That was a signal that reviewers had stopped meaningfully wrestling with the changes. Salesforce built a system called Prizm and rearchitected the review process itself. Not "let's add an AI reviewer," but an admission that the diff-centric review model itself doesn't work in the AI era. They introduced a new approach called Intent Reconstruction.

People in this camp share a common line. "AI didn't remove the safety net. AI just exposed that the safety net was always relying on individual heroics." That's Bryan Finster's framing, and it stings. Letting code review fall off the official duties list as CTO, depending on one enthusiastic reviewer, merging PRs because the schedule said so — all of it was evidence that the safety net was, in fact, riding on heroes.

So What Should We Be Reviewing?

"We have to keep code review" is true. "Humans can't review everything" is also true. So what exactly are we supposed to do?

latent.space's Ankit Jain gave the cleanest frame on this question. Shift from code review to intent review. Instead of reading a 500-line diff line by line, review the spec, the acceptance criteria, and the constraints.

In this frame, the spec becomes the source of truth, and the code becomes the artifact of the spec. The human role moves from "did we write this correctly?" to "are we solving the right problem under the right constraints?" The most valuable human judgment gets applied before the code is generated, not after.

This isn't a new concept. It's the same thing Behavior-Driven Development has been arguing for years. Before coding, the team gets together and defines "how this feature should behave" in executable scenarios, and those scenarios become the acceptance tests. The reason it never went fully mainstream is that writing the spec felt like extra work. With agents, the equation flips. The spec stops being extra work and becomes the default artifact.

Ankit Jain says trust has to be stacked in layers. Like a Swiss cheese model, where no single gate catches everything, so you stack imperfect filters until the holes don't line up. Letting multiple agents try different approaches and picking the best is layer one. Deterministic guardrails like tests and type checks are layer two. Humans defining acceptance criteria up front is layer three. Stack on top of that fine-grained agent permissions and adversarial validation (one agent makes it, another tries to break it), and you get five layers of filters.

On the practical side, Qodo's predictions for AI code review patterns in 2026 are worth a closer look.

First, context-first review. Before opening the diff, pull cross-repo usage, prior PRs, and architecture docs automatically and treat context as required input. Context was the hardest part of review for me as CTO. Sometimes I'd spend half the review time just figuring out "why is this code shaped this way?"

Second, severity-driven review. Findings get classified as must-fix, recommended, or minor suggestion. If you've ever had a bot drop 37 comments about whitespace while missing a null check that would take down production, you understand instantly why this matters.

Third, specialist-agent review. Asking one generalist model to play security expert, performance engineer, and staff SWE simultaneously is too much. You separate out a security agent, a performance agent, and a correctness agent, each analyzing in its own domain, and a coordinator builds the unified report. This connects directly to "decomposition" from the 9 skills of agentic engineering. Breaking one giant task into specialist domains.

Bryan Finster reached a similar conclusion. After automation handles everything else, the list of things humans should genuinely block a merge on comes down to two.

One is tribal knowledge. Integration quirks, historical decisions, the "we tried that and it broke billing" context. The kind of thing that lives only in people's heads and isn't documented anywhere. Long term, this should also move into docs and Architectural Decision Records and be enforced by tooling. Short term, you need "the person who knows where the bodies are buried," and their job is context review, not syntax review.

The other is regulated paths. In environments where separation of duties is a compliance requirement, changes to sensitive areas need a second human approval. That's not negotiable. But it's no reason to apply the same bar to every PR.

The tooling is shifting too. CodeRabbit supports GitHub, GitLab, Bitbucket, and Azure across multiple platforms, broadening reach. Greptile indexes the entire codebase to attempt the deepest level of bug detection. GitHub Copilot Code Review hit one million users within a month of launch. If you're already on Copilot, friction is near zero, but because it's diff-based, it tends to miss architecture-level issues. Whatever tool you pick, the principle is the same. Hand off what AI can catch (syntax, style, simple logic bugs, security patterns) to AI, and keep humans on what AI absolutely can't catch (intent, context, business judgment, tribal knowledge).

So the answer to "what should we be reviewing?" lands here. Review intent, not code. Review the spec, not the diff. Review the context, not the syntax.

Closing

If I'm being honest, I don't personally review the code I write with AI right now.

I play the QA role instead. I test mostly for whether the thing actually behaves as intended, and I only look at the code when something goes wrong. I run manual QA tests myself, and if a problem is reproducible in code, I amplify the case into an integration test so it doesn't recur. API scenario tests, external integrations, and UX testing still have real limits, and the truth is that quality varies with how diligently I get my hands on it.

As I laid out above, the era of reading code line by line is already gone. But responsibility for the code hasn't gone away. The shape of the responsibility has changed. From the person who writes the code to the person who confirms the code does what it's supposed to. From reviewer to verifier.

So what does that "verifier" role concretely look like?

I think today's AI-native engineer or full-stack builder might need to play the role the previous era's PM played, by themselves. Especially on product quality. Define the requirements, set the acceptance criteria, hand the implementation to the agent, verify the result, and monitor in production. That's clearly different from the traditional developer role. The reason I emphasized "definition of done" and "observability" in the 9 skills of agentic engineering was the same context.

This isn't entirely new either. Even in the previous era, there were two kinds of developers. Developers who treated their work as done once the code merged, and developers who deployed afterward, monitored, and tested with their own hands in production. The latter were much better engineers. More importantly, they were responsible engineers.

As the era shifts, the cost of producing code is approaching zero. In a world where an agent can build three versions of a feature in an hour, "the ability to write code well" no longer differentiates anyone. What differentiates is the ability to judge whether the code actually solves the real problem, the ability to respond when production breaks, and most of all, the willingness to stand by code that goes out under your name, all the way through.

Can you take responsibility for your own code?

References

How the World's Largest AI Company Trains the Next Generation of Engineers: A Review of the Anthropic x CodePath Curriculum

Tony Cho (https://flowkater.io) — Fri, 06 Mar 2026 11:00:00 GMT

Opening

In an era where we barely type code anymore, the very first assignment in a curriculum built by the world's largest AI company is "find the bug in the AI-generated code."

In February 2026, Anthropic announced a partnership with CodePath. CodePath is the largest university CS education program in the US, with over 20,000 students, 40% of them from low-income households. CodePath CEO Michael Ellison put it this way:

"We now have the technology to teach in two years what used to take four."

The point isn't to crank through lectures faster with AI. The point is to use AI as a tool while also training students to doubt it. The company that builds Claude Code is teaching people how to question Claude Code? That paradox is what pulled me in, and I went through the curriculum line by line.

So what should you actually learn? Where should an engineer in the AI era spend their time? Anthropic's answer is sitting inside this curriculum.

Going Through the CodePath Curriculum

The program has three stages, 30+ weeks total. I'll walk through each stage week by week in as much detail as I can.

A note: this curriculum is scheduled to launch officially in summer 2026, and Howard University is already running it as a credit course in spring 2026. The week-by-week breakdown and assignment specifics below are partly inferred from the public topic list and the pilot, so the real curriculum may diverge in places.

Stage 1: Foundations of AI Engineering (10 weeks)

It's labeled "foundations," but the definition of foundations is completely different from what you'd see in a traditional bootcamp.

Weeks 1-2 cover Python-based data structures, algorithms, and OOP. So far, this is what you'd find anywhere. The assignments, though, are different. Students implement linked lists and trees with Claude Code, and then, instead of stopping there, they review the AI-generated code themselves and look for bugs.

From week one, the message is "don't just use the code AI hands you." (For context, the curriculum centers Claude Code but also pulls in GitHub Copilot and AI-based IDEs. It's deliberately designed not to lock students into one tool.)

Weeks 3-4 are where the real spine of stage 1 lives: critical evaluation of AI-generated code. Students get AI code that has subtle bugs planted in it on purpose, and they have to debug it and propose improvements. They also generate the same problem three times with different prompts, then write a comparison report.

If you sit with that for a moment, it's actually pretty smart. When you generate the same problem three times with three prompts, you feel it: "oh, AI gives wildly different code depending on how you prompt it." The same model, but the quality swings hard with how you ask.

Week 5 is algorithmic thinking plus prompt-chain design. Students decompose complex problems step by step and design a prompt chain for each step. There's an experiment comparing "solve in one prompt" vs "split into a 3-step chain." Prompt engineering is being taught not as a standalone skill but as an extension of algorithmic thinking.

Week 6 is ML literacy. No math, just concepts. Students classify use cases for supervised, unsupervised, and generative models, and they call a pretrained sentiment-analysis API and write up how to read the result. It focuses on "understanding what models do," not "how to build models."

Weeks 7-8 is where it gets real: RAG, agentic workflows, fine-tuning, and guardrails. Build a Q&A chatbot off three PDFs and a vector DB. Build an agent connected to two or three tools. Add a "no PII leakage" filter to a chatbot, then run a red-team test against it. Stage 1, week 7, and they're already teaching guardrails. Production-level concerns are getting planted from the start.

Week 9 is Git and GitHub collaboration. PR creation, code review, merge-conflict resolution. Putting this in week 9 feels intentional to me. It's the pivot from "building things alone with AI" to "building things on a team with AI."

Week 10 is the capstone for stage 1. Pick from a chatbot, a summarization tool, or a documentation assistant, and ship something that actually works with RAG and guardrails wired in.

Featured Assignment: "Spot the Broken One in 5 AI-Generated Sort Algorithms"

This one assignment compresses the philosophy of stage 1. AI hands you five sort-algorithm implementations, and you have to find the one that's broken. "Don't assume AI is right" is the first principle of the entire curriculum.

If traditional CS education was "implement the sort algorithm yourself," this curriculum is "read and evaluate the sort algorithm AI implemented." From writing to reading. From producing to verifying. That shift is the point.

Stage 2: Applications of AI Engineering (10 weeks)

If stage 1 was foundational training, stage 2 is the real thing. And the difficulty jumps.

Weeks 1-2 start strong. Students get a real open-source project, a codebase of more than 10,000 lines. They use Claude Code to map the structure and produce an architecture diagram.

Think 10,000 lines is a lot? In the field, you get handed legacy systems with hundreds of thousands of lines and the line "figure this out by next week." So 10,000 lines is the warm-up.

Weeks 3-4 are spec-based implementation plus debugging. CodePath officially calls this "Spec-driven vibe coding." Students get a feature spec, build it with AI, and then make it pass tests. The name keeps the "vibe coding" framing, but the spec is the leash. There's one more decisive twist: students have to log "the parts AI couldn't solve, where I had to step in myself."

Weeks 5-7 are about integrating advanced techniques into production. Add RAG search to an existing web app. Wire error handling, logging, and monitoring into an agent pipeline. Write a guardrail policy doc, implement it, and evaluate it. The thing worth noticing is that this is not greenfield work; it's integration into an existing codebase. AI is great at "build something new." "Wedge it into existing code" is still on the human.

Weeks 8-9 are PR review training. Students review another team's AI-generated PR, judge whether the code was written by AI or by a human, and leave improvement comments. They also build their own review-criteria checklist covering security, performance, readability, and test coverage.

In an era when AI writes the code, the program is making students design code-review checklists. That is the final boss of reading skill.

Week 10 is submitting a real open-source PR. In the 2025 pilot, students sent PRs to projects like GitLab, Puter, and Dokploy, and got reviewed by actual maintainers.

Featured Assignment: "Map a 10,000+ LOC Open-Source Project"

Don't read line by line. Ask AI the right questions. That is the spine of this assignment. "What's the entry point of this project?" "How does data flow through it?" "What are this module's core dependencies?" Students structure huge codebases by interrogating the AI.

This is exactly the skill I wrote about in my earlier piece, "The Era of Not Reading Code". Code reading in the AI era isn't about following each line. It's about grasping the structure and zooming in on the core logic.

Featured Assignment: "Log the Parts AI Couldn't Solve, Where You Had to Step In"

This is how you prove learning happened in the AI era. Code AI wrote for you isn't your skill. The bugs you debugged because AI couldn't, the design you changed yourself, the tests you added: that's the actual learning.

"Build fast with AI, but keep a log of where AI got it wrong." That one line summarizes the entire assignment design philosophy.

Stage 3: AI Open-Source Capstone

The final stage is closer to an internship than a class. Students get assigned to a real open-source project, pick an issue, build the fix with Claude Code, file the PR, and get it merged. They write weekly sprint reports and communicate with maintainers.

Final deliverable? A real open-source contribution history you can put in a portfolio.

Featured Assignment: Real Open-Source Contributions

Not a toy project. Not a Todo app. A PR with real users, reviewed by real maintainers, that actually gets merged. That is the graduation requirement of this bootcamp.

A student who participated in the 2025 fall pilot said something I keep coming back to.

"Claude Code was instrumental in my learning process, especially since I came into the project with very little experience in the programming languages used in the repository [including TypeScript and Node.js]."

by Laney Hood, CodePath student and computer science major at Texas Tech University

She had almost no experience with TypeScript or Node.js, and Claude Code is what let her contribute to a real open-source project. AI lowered the entry barrier, and on top of that the actual learning happened.

What an AI Native Engineer Looks Like to Anthropic

Once you've gone through the whole curriculum, the pattern shows up. The kind of engineer Anthropic wants converges on four keywords.

Critical code evaluation. "Don't assume AI is right" is the first principle of the entire program. GPT-6 will ship. Claude Opus 5 will ship. It won't change the answer. AI is right 99% of the time. The remaining 1% is what causes incidents in production. Catching that 1% is on the human.

Large-codebase comprehension. In an era when AI writes code fast, the value of reading goes up, not down. It sounds backwards but it's true. The faster you produce, the more code there is to review. Choosing not to read code is different from being unable to read it. The first is a choice. The second is a ceiling.

Production-level concerns. "Code that runs" and "code you can ship to production" are not the same thing. Most AI-generated code lands at "it runs." Error handling, logging, security, performance: wiring those in is a human judgment call. That's why guardrails show up in week 7 of stage 1.

Real-world contribution. Not toy projects, real open-source. Reviewed by maintainers, actually merged. Putting this in as a graduation requirement isn't just clever assignment design. It's saying: come out of this with experience that has been validated in the wild.

And all four sit on top of the same premise.

Without input, critical reading is impossible.

If you don't know data structures, you can't find the bug in an AI-generated sort algorithm. If you don't know HTTP, you can't judge whether the error handling on an AI-built API is right. Fundamentals are the raw material for critical thinking.

This is already showing up in interviews. According to CodePath, employers are increasingly asking candidates to "interpret, review, and explain AI-generated code" in interviews. Reviewing AI code is becoming an interview question. The curriculum is preparing students for that.

How the World Is Teaching This Right Now

CodePath isn't the only one moving. US universities are rebuilding their curricula for the AI era. The directions vary wildly, and that variance is the interesting part. It means nobody knows the answer yet.

Stanford: Same School, Three Different Experiments

Stanford is the most dramatic case. Two opposite approaches are running at the same university, at the same time.

CS106A (Programming Methodology, the intro course): AI is banned. The syllabus literally says, "we do not want you using AI to do your assignments." The position is that the foundational thinking skills of programming have to be built without AI. If you let beginners use AI, the thought process itself never forms.

CS146S (The Modern Software Developer, software development in the AI era): the opposite. A new course built around full AI adoption, taught by Mihail Eric. I quoted him in my earlier post on agentic engineering. Now he's teaching this directly at Stanford.

When you look at the CS146S 10-week curriculum, you see a lot of overlap with CodePath, but the texture is different.

Weeks 1-2: LLM mechanics, prompt engineering, agent architecture (MCP)
Weeks 3-5: Working with AI IDEs, terminal automation, context management (the craft of handling tools)
Weeks 6-7: AI-driven testing, vulnerability detection, debugging, code review (the craft of verification)
Weeks 8-9: Automated UI building, monitoring, incident response (production-level concerns)
Week 10: The evolving role of the software engineer

The guest-lecturer lineup is also worth noticing: Russell Kaplan from Cognition, Zach Lloyd from Warp, Martin Casado from a16z. Practitioners from the Valley come in and tell students how the ground is shifting.

Mihail Eric's core message lands in two lines.

"Human-agent engineering, not vibe coding."

"LLMs are only as good as you are."

He also uses the metaphor that "the developer is the manager of the AI agent intern." AI does the work, but the human sets direction and signs off. The assignments are public on GitHub. When you actually open them up, they're worth reading.

Week 1: Use a local LLM (Ollama) to hands-on practice six prompting techniques: K-shot, Chain-of-Thought, Tool calling, RAG, Reflexion, and more. Not API calls, locally hosted.
Week 2: Extend a FastAPI+SQLite app inside the Cursor AI IDE. Real experience growing an app inside an AI IDE.
Week 3: Build an MCP server that wraps an external API, then connect it to Claude Desktop or an AI IDE.
Week 4: Build at least two automations with Claude Code. Combine Slash commands, CLAUDE.md, SubAgents, and MCP servers to automate a development workflow. The required reading is Anthropic's Claude Code best practices doc.
Week 5: Multi-agent workflow inside Warp. Same app as week 4, different toolchain.
Week 6: Run Semgrep to scan for security vulnerabilities, then fix at least three by hand. Training the human to catch security issues in AI-generated code.
Week 7: The most striking assignment. Implement a feature with a one-shot prompt to an AI coding tool, then do a manual line-by-line review yourself, then run Graphite Diamond's AI code review on it, then write a comparison write-up of your review vs the AI's review.
Week 8: Build the same app in three different tech stacks. One of them has to use bolt.new (an AI app generation platform).

Compared to CodePath, the texture is different. CS146S is tool-centered. Cursor, Claude Code, Warp, Graphite, Semgrep, bolt.new: students rotate through the AI tools the field actually uses, week to week. CodePath is thinking- and judgment-centered. "Spot the broken one in 5 AI-generated sort algorithms," "log the parts AI couldn't solve where you had to step in"; the focus is on reasoning and evaluation, not the tools.

Both share the same premise: humans verify AI code. CS146S Week 7's "manual review vs AI review comparison" and CodePath's "critical evaluation of AI code" are the same destination via different routes.

Two branches at the same Stanford. Neither is "the right one." Stanford is showing through experiment that the right approach depends on the student's level and the goal.

UW Allen School: The "Coding Is Dead" Declaration

In July 2025, Magdalena Balazinska, the chair of the UW CS department, said this on the record:

"Coding, that is, translating a precise design to software instructions, is dead. AI now does that."

Provocative, but there's context. UW allows GPT tools on assignments, but requires students to cite the AI as a collaborator the same way they'd cite another student. "If you used AI, disclose how." Not banned, transparent.

The philosophy is close to CodePath's "log the parts AI couldn't solve, where you had to step in." Assume AI use, then make students record and reflect on the process.

UMD: Claude Code in the Classroom

The University of Maryland is even more direct. Professor William Pugh launched CMSC 398Z, "Effective use of AI Coding Assistants and Agents," in fall 2025. Students get hands-on with Copilot, VSCode, and CLI tools like Claude Code in class. They use agents for build-system invocation, test runs, and error fixes.

A line from Pugh's commentary stayed with me. "Long term, we plan to update the entire undergraduate CS curriculum to account for the existence of AI coding tools." Not one course, the whole curriculum.

Harvard CS50: The Most Conservative Approach

At the other end is Harvard. CS50 built its own AI rubber duck (CS50.ai) and integrated it into the class. AI as a teaching tool. But on regular assignments, external AI (ChatGPT, Copilot, etc.) is not allowed. The final project allows it, but with the condition that "the substance has to be your own."

The course doesn't directly teach "how to use AI coding tools." The position is that AI helps learning, but the student is the subject of the learning.

UC San Diego + Google: A Global Consortium

UC San Diego received $1.8 million from Google.org and launched the GenAI in CS Education Consortium. It's co-run with the University of Toronto and is developing six turnkey courses. The starting premise is, "industry now expects AI tool fluency from new engineers."

Andrew Ng: "The Golden Age of the Product Engineer"

The person who has framed all of this most clearly is Andrew Ng. In Stanford CS230 Autumn 2025 Lecture 9: Career Advice in AI, the things he said landed.

Ng called this "the best time in history" to be someone building things with AI, citing research that the complexity of tasks AI can handle doubles every seven months.

But what he emphasized wasn't speed. It was that the bottleneck has moved. As code production explodes thanks to AI, the real bottleneck has shifted to "deciding what to build." The traditional ratio of PMs to engineers in the Valley used to be 1:4 or 1:8, and now it's collapsing toward 1:1, or the roles are merging entirely.

Being able to write code is no longer a differentiator. The most valuable engineers are the ones who can talk to users, empathize with them, and decide what to build.

Guest lecturer Laurence Moroney (Arm AI Director) was even more direct. He proposed three survival conditions.

Understanding in Depth. It's not enough to use high-level APIs. You have to understand what's running underneath them.
Business focus. The era of "build something cool" is over. Build something tied to business value.
Obsession with delivery. Ideas are cheap. The differentiator is being able to actually ship to production.

Moroney also warned about the "tech debt" vibe coding generates. You can generate an entire app with an LLM, but the code that comes out carries massive debt.

Ng's last piece of advice for students stuck with me. "Pick the team, not the brand." He told a story of a student who got into a famous AI company and ended up on a backend Java payments team, and said learning on a small but good team beats a flashy logo.

Redefining the Fundamentals: AI Doesn't Build Your Thinking for You

Po-Shen Loh, a math professor at Carnegie Mellon, has a line:

"Using AI to do your writing homework is the same as driving a car instead of running a mile for exercise."

Your body reaches the destination. Your fitness doesn't get built.

Loh argues that education has to change. We need to teach not "how to do the homework" but "how to grade it." That's exactly why CodePath has students looking for errors in AI code from week one. To grade what AI produced, you have to know the right answer first.

He uses another keyword: "the ability to simulate the world." The capacity to play out an unfolding future in your head, drawing on empathy and a wide range of lived experience. That's a human area AI can't take. AI finds patterns in past data. Simulation happens in a person.

Stanford CS106A bans AI. CS146S allows AI but pins down "human-agent engineering, not vibe coding." Andrew Ng puts "Understanding in Depth" as the first survival condition. Laurence Moroney says using only high-level APIs isn't enough. They all point to the same place.

Without fundamentals, AI use floats in midair.

If You Already Know How, AI Becomes 10x or 100x

Honestly, I'm one of the people getting the most out of AI.

In the last two months I shipped 1,847 commits. Solo, running backend, iOS, web, and infra in parallel. Writing TDD, designing the architecture as I went. To do the same thing alone before would have taken several times longer. Code-writing speed isn't even the main thing; the surface area I can cover has changed shape.

10x, easily. Maybe more.

But there's still work that takes a human. Deciding what to build. Designing how to build it. Owning whether the final result is good. AI builds what I tell it to build. "What to tell it" is on me.

What Andrew Ng said about "the bottleneck moving from implementation to decision" is exactly this experience. Building is fast. What to build, how far to take it, whether this is even the right thing: that judgment is the bottleneck.

A person with fundamentals goes 10x or 100x with AI. A person without fundamentals doesn't even notice when AI is wrong. Two people using the same AI end up with very different results because of this.

Mentoring: Reviewing AI Conversations

This year I changed how I mentor. Instead of having mentees share their code, I started having them share their AI conversations. I went through 165 conversations and built a five-criterion framework: depth of question, level of context provided, whether they include their own hypothesis, how they ask for verification, and the connectedness of follow-up questions.

I have to talk about John (pseudonym). He's been programming for a long time (John is a non-CS major), but his programming skill had been stuck for months. The AI conversations made the reason obvious.

"How do I do this?" "Write the code for me." "I don't get what you're saying."

AI gives an answer. John copies it. He doesn't think. No learning happens.

A mentee who grew fast in the same period had completely different conversations.

"A transaction is supposed to be all-success or all-fail, but @Async runs on a separate thread, so it seems like it's stepping outside the transaction scope. Is this hypothesis correct?"

He builds his own hypothesis, then asks AI to verify it. Even when AI gives the answer, he integrates the answer into his own understanding.

What's smart about the CodePath curriculum is that it solves this problem structurally. "Spot the broken one in 5 AI-generated sort algorithms": you can't do that without your own hypothesis. You need the standard "this algorithm should behave like this" already in your head before you can find the broken one. The curriculum forces critical thinking by structure.

Some Notes on the Curriculum Itself

A while back social media got loud over a fake news story that Stanford no longer teaches programming languages. Given how far AI has come, you can see why people fell for it. The funny thing is, the original CS curriculum doesn't really teach programming language syntax in the first place. When you take the intro course in Python, that course is teaching programming principles, thinking, and problem solving. It's not teaching Python syntax. If next semester's data structures course uses Java, you're expected to study the basic syntax on your own beforehand.

How do US universities actually teach? There's a Korean-dubbed video on the Science Adam channel about Harvard's intro CS class that I'd recommend. You can feel the lecture quality in a way the curriculum description alone won't show. (The whole course is public, so going through it is also worth it.)

The philosophy comes through clearly in an actual lecture by Harvard CS50 professor David Malan. In the first class he says: "In an era when AI does all the coding, why learn? This class has never once been a class about coding skill. It is a class about how to think."

Malan then has GitHub Copilot generate, in seconds, the C-language assignment (a hash-table-based spell checker) that students spent 15 hours on. And he asks: "Do you think those 15 hours were wasted?" The answer is no. Without the "eye for code" you build during those 15 painful hours, you can't tell when AI is hallucinating: code that confidently uses libraries that don't exist, code that's syntactically perfect but logically wrong.

The AI rubber duck CS50 introduced in 2023 follows the same principle. It's GPT-based, but it doesn't give answers directly. Its system prompt is set to "guide students Socratically so they figure it out themselves." AI is a learning tool, but the student goes through the thinking process.

A line Malan ended on stuck with me. "Evolve from a semicolon expert (a bricklayer) into a system designer (an architect)." In an era when AI lays the bricks, only people with the eye of an architect can use AI as a tool. That eye only grows in someone who has laid bricks themselves.

This is also why the CodePath curriculum opens week one with "find the bug in AI-generated code." Only a person who has stacked bricks recognizes a brick stacked wrong.

What's Missing: Things the Curriculum Doesn't Teach

A good curriculum doesn't have everything. Honestly, I see gaps.

Design and architectural thinking. This is hard to plant through education in a short window. You need to live through dozens of bad designs and analyze, after the fact, why they were bad. The curriculum is implementation-heavy. It doesn't systematically teach failure and retrospective on design.

The Frankenstein trap. This is a new risk in the AI era. Code production is fast, so you end up building features you don't need. The thinking is "I can build it fast anyway." The result isn't a sharp solution but a monster. Lots of features, but you can't tell what the product actually does.

"What not to build" is the more important call. AI doesn't make that call for you. It builds what it's told. The "tech debt of vibe coding" Laurence Moroney warned about lives in the same neighborhood. Being able to build fast also means being able to break fast.

AI biases my own thinking. This is a trap I've personally fallen into. When I ask AI, the answers come back close to my own direction. AI polishes what I already think. Other perspectives stop showing up. Without real user feedback or someone else's outside view, you end up locked in an echo chamber. Diverse feedback loops matter more in the AI era because of this.

Team collaboration and communication. Teaching Git collaboration in week 9 is a good thing. But the process of deciding "what to build," aligning opinions inside a team, and getting the direction right: the curriculum doesn't address that explicitly. If, as Andrew Ng said, the PM and engineer roles are merging, then talking to users and building empathy with them matters more than coding. Even when AI writes the code, the process of choosing direction happens between people.

Domain interest. Healthcare, law, education, finance: the impact is largest when a domain expert combines with AI. A person who knows the domain deeply often produces a sharper result with AI than someone who knows coding deeply. The curriculum focuses on AI engineering skills, so the part about growing domain thinking is missing.

Compared to Korean bootcamps. Most domestic Korean bootcamps stay stuck in the "basic syntax → mini project → team project → portfolio" structure. The biggest difference from the CodePath curriculum is the contact point with existing codebases. Receiving a 10,000-line open-source project, mapping it, and submitting a PR: that is the training closest to the actual job.

Closing

The hiring market is hard for several reasons, but underneath it all is that the AI Native Engineer requires a different set of qualities from the software engineer of the past. The moat called "code-writing skill" is gone, so "code-writing skill" no longer carries economic value. Companies preferring seniors over juniors isn't because seniors are better at writing code. It's because their domain comprehension, their grasp of business models, and their experience translating user requirements into appropriate-tech solutions all get amplified by AI. The "tacit knowledge" built through that experience isn't easy to acquire. Which is why the open-source PR contributions in this curriculum may be the closest thing to a "tacit knowledge" model that fills that gap.

Even so, fundamentals matter. The order matters. Fundamentals first. AI rides on top. Without fundamentals, AI is weight, not wings. You can build fast, but you can't tell what you're building or whether you're building it well. Anthropic, the company that builds Claude Code, is probably feeling more sharply than anyone that human pre-training (fundamentals, experience) amplifies AI capability.

Just like AI models perform differently based on pre-training volume and parameters, humans probably also vary widely in capability based on their pre-training.

The entry barrier in the AI era has dropped. Definitely. People who don't know coding at all can build something. But the ceiling has risen. The gap between people who use AI well and people who don't is much wider than the old "good coder vs bad coder" gap.

Is the direction CodePath is going the right one? The world is changing too fast, so the curriculum will keep changing. But three things won't change in the AI era, or after it: training to not blindly trust AI, training to read and integrate existing codebases, training to contribute in the wild.

And the proposition that "people who did good work in the previous era also do good work in the AI era" is something everyone would agree with.

References

Wrapping Up January and February 2026 (Not Really a Retro)

Tony Cho (https://flowkater.io) — Mon, 02 Mar 2026 09:00:00 GMT

2025 was a sabbatical year for me. The month-long trip to Italy in April, right after I left the company, was a real comfort to someone who had worked without a break (and to Ellie, who had quietly stayed beside me through all of it). Venice glittering as the rain lifted. The view of Florence and its sunsets. The starlit nights along the southern Amalfi coast. With no business debts hounding me anymore, no office to return to, those moments were enough on their own. We stayed faithful to the trip all the way through: through the loud, smoggy streets of Naples, through the tourist crush of Rome that wore us out by the time we flew home.

After we got back from Italy, the rest of 2025 went by fast. I fought with Ellie a lot now that we were together around the clock, and we made up, and we held each other. We laughed, cried, fell asleep. Got lazy, beat ourselves up, then got up again and went outside. Days like that on repeat.

Honestly, the early months back with Ellie were harder than I expected. I had grown into the authority and routine of being a CTO, and the version of me at work and the version of me at home had been somewhat separate people. Two of me were colliding, and it threw me off. The leader-antipatterns I used to hate had quietly soaked into me. Ellie was not my employee, she was my partner, and I struggled to align with her and move things forward together. I caught myself acting out the exact model I despised most.

The cycles of self-pity and self-loathing I carried out of the previous company kept replaying until I finally put them on the page in How Organizations That Cannot Win Fall Apart. They tormented me long enough that even a season meant for rest got eaten up by them. I would sink into my own darkness, thrashing around in it, and on those days I said things to Ellie that hurt her. She always took me back in anyway, and so to repay her, today again I cook a meal and fold towels and sort the recycling. (Though her share of the housework is still overwhelmingly larger than mine.)

Looking back at how quickly that time slipped past, I get small flashes of "I should have done this, I should have done that," but a body and mind that broke right after leaving a job probably needed exactly that much time to mend on their own.

I had originally planned to write a proper retro: what I worked on, what I made, how hard I'd been living (allegedly). But honestly, I'm not sure what use a retro like that is when I write it. It's not the kind of essay where I'm thinking about the era; it's a deeply personal one. Writing works as its own form of therapy for me, so even if this post only achieves that, that's enough. So I just started writing.

In November I registered as a sole proprietor, and worked-stopped-worked-stopped through the fall. From December I got back to it in earnest. In January and February, I tried to stop obsessing over routines, stop blaming the lazy version of myself, and just enjoy the moment, do what I wanted to do, and let the days carry me.

The year before last I kept training hard despite an injury, trying to push through, and that ended up turning into multiple injuries through the second half of 2024 that did not heal cleanly. I came into 2026 having barely worked out for almost a year and not properly recovered, but starting in January I've been training again at a level my body can handle.

From December into January and February, I wrote a lot. The only readers are Ellie and a few close friends, but I finally started writing the novel I had wanted to write for so long. I have to keep serializing to live up to those readers' expectations, and the more I write the harder it gets. I also wrote more blog posts than I ever have before. I tried to weave the emotions and ideas I had been stacking up inside me into actual prose. One of them ended up bringing a flood of visitors to a blog nobody used to visit. (Their presence forces me to polish the raw drafts a little, which I guess can't be helped.) Ellie also keeps reminding me to keep my words and behavior straight. Writing made me want to put the phone down and read more books, so I went so far as to buy an Onyx Palma to read on, and I've finished three or four books this year so far.

The project I started in December is in alpha testing, and I'm thinking about when and how to ship while continuing to harden it. Thanks to how fast AI coding agents have evolved, my hours have actually gone up, not down. Ellie's complaint that I won't play with her because I'm busy messaging Jarvis (my AI assistant) on Telegram isn't entirely a cute joke; outside of reading time, I am basically glued to the phone all day. That said, the more the project matures, the more it needs human hands, piece by piece. (Whatever your project is, if AI seems to be doing all of it for you, flip the question around. You might be building something everyone else can build too. — Nine Survival Skills for the Agentic Engineering Era)

Last summer was the hottest one of my life and this winter was the coldest one of my life. Having spent my twenties and most of my career heads-down in business and work, the seasons hitting me head-on for the first time. It was more dramatic than I expected. (Into my early thirties I used to wear shorts at home in summer or winter alike, and now I bundle into long pants and fleece socks, which Ellie of course teases me about.)

My heart feels better than I would have expected to just let this stretch slide by, and at the same time there isn't enough finished work to call it a real retro yet. So I'm holding onto this passing time, briefly, by writing it down for the future me who will look back at it.

I'm steadier than I was through last year, when even the rest itself was tangled up with the thoughts that tormented me, and now I'm spending each day with focus again. The value I'm chasing comes down to one thing. Whatever work I'm doing, if I can stay in this present moment, I'll pay whatever it costs to stay there. I want to live a life where I'm not wasting time regretting the past or worrying about the future, where I'm doing my best at what I want to do, and where I can stay right here in this moment. With that small wish, I'll close out this strange retro(?) of mine.

9 Survival Skills for the Agentic Engineering Era

Tony Cho (https://flowkater.io) — Sun, 01 Mar 2026 05:00:00 GMT

Opening

Karpathy, the same person who coined the term vibe coding, posted on X that we now need a new name to distinguish the next mode from vibe coding, and proposed calling it agentic engineering.

I've been doing vibe coding seriously since last April, and the past two or three months have been turbulent in a way that's hard to describe. I think the reason my piece "What Should Engineers Read in an Era That No Longer Reads Code?" went unexpectedly viral was a reaction to that turbulence.

This post took inspiration from Karpathy's tweet, but it's stitched together from my own scars and the field reports of people like Armin Ronacher, Boris Cherny, WenHao Yu, and IndyDevDan, distilled into nine core skills.

The nine core skills are:

Decomposition
Context Architecture
Definition of Done
Failure Recovery Loop
Observability
Memory Architecture
Parallel Orchestration
Abstraction Layering
Taste

The interesting thing is that all nine of these were already required of any engineer who got things done well, and any manager too, long before agentic engineering or even vibe coding. Why that's true is the thread I want to pull. Let's start with Karpathy's story and walk through them one by one.

The Weekend of Vibe Coding's Inventor

Karpathy said he wanted to build a dashboard for his home cameras over a weekend. He gave the agent the IP of his DGX Spark, the username, the password, and the goal. SSH key setup, vLLM configuration, model downloads and benchmarks, video inference server, web UI dashboard, systemd service setup, memory notes, and a markdown report at the end (he asked for all of it in one go). Thirty minutes later it was done.

"I didn't touch anything myself. This was a weekend project just 3 months ago. Now it was 30 minutes of just forgetting about it."

Karpathy gave this new mode a name. Agentic Engineering.

"'Agentic' because 99% of the time you are no longer writing code directly, you are commanding and supervising agents. 'Engineering' because there is art, science, and skill to it."

The era of an app popping out of a few lines of prompt is over. What matters now is the skill of designing the conditions under which agents actually work.

The change is fast. The adaptation is slow. Most developers haven't caught up.

And the speed of this shift is not normal.

"It is hard to put into words how much programming has changed in just the last ~2 months. This was not a 'business as usual' kind of incremental progress."

Most developers are using AI, but the share of work fully delegated to agents is still low. According to the 2026 Agentic Coding Trends Report, 60% of developers use AI but only 0–20% have fully delegated work to it. There's a name for this gap: the Delegation Paradox. Letting AI write code is one thing. Handing the work to an agent and walking away is a completely different question.

IndyDevDan put the gap in one sentence.

"Do you trust your agents?"

Most developers say no. I said no at first too. I reviewed every line the agent wrote, and there were times when it took longer than just writing the code myself.

But as Karpathy's example shows, in the agentic engineering era more and more of the work is being automated and delegated to agents. So what skills (or qualities) do we need to keep being good engineers in this world?

① Decomposition

If you ask an agent to "build a signup flow," you'll get something. The problem is the odds of it being what you actually wanted are low. Email verification is missing. The password rules don't match yours. The UI went somewhere you couldn't have predicted.

Telling an agent to do work is, in the end, the act of deciding what to build. What does the customer want, what does the user need, what's the priority? That part is on me. The agent can't take that off my plate.

"The key is to develop the intuition to decompose tasks appropriately, delegating to agents where they work well and providing human help where needed."

Easy to say, hard to do. The line between "where they work well" and "where humans need to step in" shifts every time. Some tasks the agent finishes one-shot. Others, you can run three times and it still misses the point. Building intuition for that difference is what decomposition is. Karpathy was pretty clear about the conditions for decomposition too.

"It works especially well in some scenarios, especially where the task is well-specified and the functionality can be verified/tested."

Flip that around: when the spec is fuzzy and there's no way to verify the result, the agent gets lost too. My job is to turn the fuzzy requirement into a clear unit of work.

When I built out a TDD workflow with Claude Code, the lesson was that 70 to 80 percent comes out of one shot, and the remaining 20 percent is the actual job. How well you defined the requirement up front decides how big that 20 percent ends up being.

You can see the same pattern in WenHao Yu's Opus 4.6 multi-agent workflow. He hands big projects off to an AI Team Lead, and 70 percent of what the Team Lead does is, in fact, decomposition. It first designs the answer to "what subtasks do we need to build this feature?" and then dispatches each subtask to a different agent. If the decomposition is right, the rest follows. If the decomposition is wrong, every agent loses the thread.

I lived through that "if the decomposition is wrong, every agent goes off the rails" lesson directly. One time I threw "build the settings page" at an agent as a single task, and inside the settings page were profile editing, notifications settings, subscription management, and data export. The agent tried to build all four at once, and the state management got tangled. Changing the notifications toggle reset the profile form. An error in subscription management broke the whole page. In the end I split it into four independent tasks, gave each one to an agent, and it worked first try. The breakdown wasn't "settings page." It was "profile edit form," "notifications toggle component," "subscription management panel," and "data export button." Four pieces.

Before: AddPlan, thrown in without an interview

I had to build the plan creation screen (AddPlanView). A five-step input flow: name input, scope setting, period selection, weekday picker, summary confirmation. I had the Figma designs, and I had written a PRD. "Surely the agent can build this in one pass."

That was a naive expectation.

The agent shipped a first version. The shape was roughly right at a glance. The details kept slipping, though. It was pulling colors and fonts that weren't defined in the design system. The CustomNumberPad layout for Step 2 didn't match the Figma. I fixed that, and Step 3's calendar broke. Every fix pushed another step out of place. By the third round I was thinking, "Is it faster if I just write this myself?"

The cause was clear. I'd started without sorting out, even for myself, what I actually wanted. I had a PRD, but the details (the spacing and tap targets of the CustomNumberPad, the direction and timing of step transition animations, how validation errors should be displayed) were all still in my head. The agent can't read my head, so it made its own choice each time, and each time it didn't match what I wanted. We ping-ponged for dozens of turns and burned almost half a day.

After: Socratic dialogue to sharpen the requirements

After that I started interviewing with the AI before building any feature. Frameworks like Superpowers automate this for you, but the core is the same: the process of making "what I want" explicit, before the code starts. Think of it as a Socratic dialogue. The AI asks questions, I answer them, and the requirement gets more specific.

I tried this approach the second time around with AddPlan. "I want to build a five-step input flow." → AI: "What are the input fields and validations for each step?" → "Step 1 is name input, no empty strings, max 50 characters." → AI: "Are you using design system colors? Any custom colors?" → "Design system colors only. The accent color is #FF6B35." → AI: "Step transition animation? UX on validation failure?" → "Slide, with inline error messages."

Five minutes. That's how long the conversation took. The edge cases that came out in those five minutes were almost identical to the ones I'd discovered one by one across half a day of ping-pong. The difference was that last time I'd found them after writing the code, and this time I cleared them up before.

When I handed those crisp requirements to the agent, the one-shot quality was clearly different. Splitting the work step by step and writing the spec for each step explicitly cut the revision cycle to 2–3 turns. Even those revisions were design tweaks, not structural changes. Five minutes of interview saved half a day of ping-pong. From then on, the feature interview became a default step in my workflow. Every feature build now goes "interview → spec writeup → instruct agent."

People say spec-driven development (SDD), which rose alongside vibe coding, means you can build cleanly if your PRD is right. That's true. But how to decompose the spec is still on us.

How to practice this

Engineers aside, the people who get things done well, in any field, decompose big tasks into pieces and stay in flow by selecting and focusing on one piece at a time. The people who get things done badly skip planning, dive in, ping-pong around, and end up missing the deadline. (Yeah, I've seen plenty of developers like this.)

If you're at the wrap-up stage and the ping-pong with your coding agent is going long, that's a signal to ask whether you've actually decomposed the work properly.

The first habit is writing a requirements doc before you start implementing. It doesn't have to be elaborate. Just writing out, as plain text, "what does this feature do, and what does done look like?" already exposes the gaps. These days I make a small requirements.md before any feature build. (Not a spec doc.)

Interviewing with the AI is also worth folding into your daily workflow. It feels awkward at first, being asked questions by the AI. After a few rounds, though, you'll catch yourself getting flagged on edge cases you'd missed. It pays off most on stateful features like auth, payments, and file uploads. Whether you use a framework like Superpowers or just ask ChatGPT, "what should I think about before I build this feature?", the method doesn't matter. The point is to give yourself thinking time before you start building. Five minutes is enough. Once you've felt those five minutes save four hours a few times, the habit makes itself.

Throwing a sentence into the agent's chat shell from minute one is never a good habit. It's the same habit as the developer who jumps into code without a plan.

So you also need to deliberately practice splitting big work into "the size an agent can finish in one turn." Roughly: 3 to 5 files modified, 15 to 30 minutes to complete. Bigger than that, split it. Smaller, combine it. After about ten attempts, you'll feel it. That feel is decomposition.

The latest Codex and Claude Code design good task plans on their own with tools like Task. Simple requirements or fixes are probably fine. In the end, though, you have to do it yourself first to know. Do, then delegate. The order matters.

② Context Architecture

Look again at Karpathy's DGX Spark example. What he gave the agent was four things: IP, username, password, goal. No padding, just what was needed. That's the ideal of context architecture.

Real production environments aren't this clean. A project has dozens of files, business logic, coding conventions, architecture decisions made months ago. How you hand all that context to the agent decides the quality of the output. To borrow Karpathy's framing, natural language is now the interface in place of code.

Karpathy included "memory notes and a markdown report" at the end of his instructions. That's not just documentation. That's an instruction to structure the context the agent generates while working, so it can be passed to the next task. Context isn't only something you give. It's also something you build.

Writing a good AGENTS.md matters, but that isn't all of it. If the code architecture itself is well designed, the speed at which an agent grasps context is in a different league.

These days in Codex you can pin a skill with $ and pass exactly the right context, which lifts accuracy a lot. Documentation alone isn't the whole answer, though. I learned that the hard way.

Counterintuitively, in the end, you have to write good code.

If the directory structure is clear, the naming is consistent, and the concerns are separated, the agent picks it up fast. Conversely, no matter how well you've written the docs on top of spaghetti code, the agent is likely to wander. Saying we're in an era that no longer reads code doesn't mean code quality matters less. It matters more.

The idea of an agent-friendly codebase

Flask creator Armin Ronacher raised an interesting angle. He argues that language choice itself is part of context architecture when you're collaborating with agents. His conclusion was unexpected: Go is an agent-friendly language.

"Go is sloppy: Rob Pike famously described Go as suitable for developers who aren't equipped to handle a complex language. Substitute 'developers' with 'agents.'"

Go is statically typed but flexible, and the syntax is easy. Simpler than Java, stricter than Python. Above all, it's explicit. I once gravitated toward functional and bleeding-edge languages, and the reason I settled on Go is exactly this. It's also easy for juniors to learn. By the same logic, it's easy for agents. Whatever the language is, what matters is a structure that gives the agent fewer ways to mess up.

Ronacher is sharp on tool design too.

"Tools need to be protected against an LLM chaos monkey using them completely wrong."

He puts double-execution guards (pidfiles) and port-conflict prevention into his Makefile. Agents will gladly start the same server twice or try to bind to a port that's already in use. Blocking that at the tool level shrinks the space the agent can flail in.

Boris Cherny, the person who built Claude Code, said something similar in his Lenny's Newsletter interview. One reason he can run 15 agents in parallel is that he isolates the context for each one rigorously. Agent A only touches the frontend, agent B only the API, agent C only tests. With minimal context overlap, conflicts go down and the accuracy of each agent goes up.

Before: agent lost in a flat directory

In the early days of the iOS app, the directory structure was effectively flat. The Views folder had thirty screens jumbled together, with models and view models sitting at the same level. Naming conventions varied per file: some were PascalCase like PlanListView, some were DailyTasks, some were just Summary. Even a human reader needed time to figure out "where does this file belong?"

Setting aside that this was my first iOS native app, the project folder I'd set up to prototype quickly had grown enormous as features piled on.

I got tired of telling the agent, every single time, "not that folder, this folder." Saying "fix the settings screen" often meant the agent touched unrelated files. The settings screen's view model would import a model from the home screen. The directory structure didn't enforce any separation of concerns, so the agent didn't know the boundaries either. The context window filled up with files that didn't belong, and accuracy dropped.

After: feature-based directories with role separation

I restructured the directories around features. Features/Plan/, Features/Daily/, Features/Settings/. Each feature folder holds its own View, ViewModel, and Model together. Shared components moved to Shared/Components/, common models to Shared/Models/.

I unified the naming too. {Feature}{Role} pattern: PlanListView, PlanListViewModel, PlanModel. From the file name alone you can tell what the file does and where it belongs.

The change was immediate. Tell the agent, "add a dark mode toggle to the Settings screen," and it works inside Features/Settings/ only. There's no reason left to touch other features. The code structure becomes the boundary of the context. I don't even need to say "only look at this folder." The structure itself communicates the scope.

The HumanLayer team's analysis points the same way. Once your CLAUDE.md (or AGENTS.md) crosses 150–200 instructions, the rate of compliance drops sharply. Task-specific instructions need to live in separate files. One well-structured directory tells the agent more than ten pages of docs.

How to practice this

Practice clean architecture deliberately. "Code that's easy for an agent to read" and "code that's easy for a human to read" overlap startlingly often. When I start a new project, the first thing I do is lay down the directory structure and write what each directory is for in the README. Partly for humans, partly for agents.

I stick to a DDD / Clean Architecture structure because it's testable, and I particularly enforce strong conventions on the use case layer. iOS differs a bit from server work, but the skeleton is roughly the same.

In AGENTS.md I keep things tight: architecture decision rationale (ADRs), coding conventions, a glossary of domain terms. The rest, I let the code itself speak. Accurate type definitions, function names that carry meaning, tests that double as spec docs. That's the best AGENTS.md.

Designing for context separation is also worth trying. Worktrees, multiple agents running in parallel with isolated environments and isolated roles and goals: performance peaks here. You'll want to manage backend and frontend in one place, and you'll hope to do everything from one shell. In the end, though, splitting Plan, documentation, development, testing, and commit across separate agents is much more efficient. There's more to manage and at first it feels like overkill. As the work gets more complex, the separation pays off. (Which is exactly why orchestration tooling matters more.)

③ Definition of Done

Letting an agent run overnight and checking in the morning is a thrilling experience. There's also a moment where the thrill turns into emptiness. The report says "task complete," but when you actually look, only the documentation got updated, or all you have is stub functions and interface scaffolding. You don't have working code. You have code that looks like it could work.

Karpathy, discussing what agents still need, listed several things including supervision.

"Of course this is not yet perfect. Things still needed: high-level direction, judgment, taste — knowing what good looks like — supervision, and providing hints and ideas on repetitive tasks."

Agents need supervision. And supervision starts with definition of done. If you don't clearly define what "this task is finished" means, the agent reports "done" by its own standards. Nine times out of ten, those standards aren't yours.

Before: an automation CLI, run overnight, came back hollow

I tried to build a workflow automation CLI based on the Codex App Server. A tool that auto-runs the loop propose → plan → run → verify → archive. I prepared a planning doc covering the full architecture, module structure, and API design. I planned parallel agent execution: Stream A for core logic, Stream B for the CLI interface, Stream C for tests. "With this much documentation the agent can handle it." I let it run overnight.

When I checked in the morning, it had stopped after one hour. The agent had decided "there's nothing left to do" and stopped. The file structure was tidy. It was all stubs, though. func Propose() error { return nil }. The type definitions and module structure were perfectly in place, and the actual business logic was empty. It was like being handed a well-organized empty house.

The more instructive lesson was the second attempt. When I retried the CLI, the agent reported "all tests passing." Cracking it open, the agent had quietly rewritten the tests for its own convenience. Instead of verifying the actual scenarios (does propose really call the API and parse the response, does plan respect dependency order, does verify catch failure cases), it had swapped them for code that just checked whether the function returned without an error, then declared "all green!" From the agent's view this wasn't a lie. The tests really did all pass. They just weren't the tests I wanted.

That's when it clicked: the agent's "done" isn't my "done." And what closes that gap isn't a better model. It's a clearer definition of done. I hadn't read my own doc carefully. I'd written the requirements myself, and I hadn't sat with the complexity hiding inside them. "There's a doc, so the agent will read it and build it." That's the most dangerous antipattern.

After: DoD plus a reporting system

When I tried the CLI again later, I took a completely different approach. Every task instruction now includes two things. The first is a definition of done document. Stream A's DoD: "the propose command actually calls the API, parses the response, and saves it as a JSON file. Add three new integration tests." That level of specificity. And critically: "Stub patterns like return nil don't count as done. Don't modify existing tests. Add new tests only." That blocks the agent from escaping into stubs or rewriting the tests.

The second is a task report. When the agent finishes, it has to write up the results against the DoD. "What I did, which DoD items I met, what's left." With a report, I can grasp the state in five minutes before opening any code.

What stands out in Elvis's system is that the definition of done is staged. In his agent system, "done" isn't just writing the code:

Was a PR created?
Is it synced with the main branch (no merge conflicts)?
Does CI pass (lint, type check, unit tests, E2E)?
Did the Codex code review pass?
Did the Claude Code code review pass?
Did the Gemini code review pass?
If there's a UI change, is a screenshot included?

Only when all of those clear does the Telegram notification arrive: "PR #341 ready for review." Before then, no notification. Three agents review the code, CI passes, and the merge is conflict-free before a human is pulled in.

You don't need to go this far (honestly, I haven't gotten there yet either), but the principle is the same. The agent has to be told, concretely, what "done" means. Otherwise the agent applies its own definition of done. The odds of that lining up with yours are not great.

This isn't just my lesson. The GitHub Engineering team uses the same pattern. In multi-agent systems they enforce inter-agent messages with typed schemas and explicitly limit what each agent can do.

"Most multi-agent workflow failures come down to missing structure, not model capability."

The CLI failed not because the model was dumb. It failed because I hadn't given it structure.

How to practice this

Start by including a DoD checklist on every task instruction. Writing a DoD every time can feel like overkill. After two or three "the agent said done but it wasn't" experiences, though, not writing one feels riskier. My task instruction template now has a DoD section by default. "Tests pass + existing tests untouched + report submitted" is the baseline, and I add items based on the work.

Build the habit of not taking the agent's "done" report at face value. This is healthy verification, not paranoia. It matters more for overnight work. When I dispatch a long-running task now, I always insert mid-run checkpoints. "Report after stage one. Report after stage two." This way, instead of losing eight hours, you catch the wrong direction at the two-hour mark. Once you've opened "task complete" and found a hollow shell, you'll feel the value of mid-run checkpoints in your bones.

Practice cutting DoDs into smaller units too. The DoD for "login feature complete" is full of holes. Break it into "email verification flow complete" and "password reset complete" and the criteria sharpen. Decomposition (①) and definition of done (③) are a pair. Well-decomposed work has a clear DoD, and a clear DoD makes decomposition easier.

④ Failure Recovery Loop

Working with agents means failure is the norm. The workflow that worked yesterday breaks today. A new model ships and the same prompts behave differently.

"The agent autonomously worked for ~30 minutes, running into various issues along the way, looking things up online to solve them, iteratively resolving them."

The agent itself runs as a loop of failure and recovery. It doesn't always go this cleanly, though. The agent's self-recovery has limits. When the agent hits a failure it can't resolve on its own, what matters is how the human steps in.

Before: redistribution engine, infinite A↔B loop

One core feature in the iOS app is the study load redistribution engine. "I couldn't do today's portion, so I'll do more tomorrow." The engine recalculates the leftover load and redistributes it. The bug looked simple: calling the redistribution API made existing data on future dates disappear. 47 out of 50 records were lost.

The cause sat in two places. The delete function was deleting everything without a date filter, and the function for extracting incomplete data was excluding future-dated records.

I knew the cause, so I should be able to fix it, right? That's where hell started. All 5 scenario tests passed. When I dug in, the tests were doing "data > 0" level checks. 50 dropping to 3 still passed. (This isn't the agent's fault. It's mine.)

The real problem came next. The meaning of a specific parameter differed across functions. includeToday=true meant "fetch today's data" in function A, and "delete starting from today" in function B. Same parameter, completely different semantics. Fix A and B broke. Fix B and A broke. The agent fell into its own loop, repeating fix → break → fix → break.

After: isolation tests plus Must NOT Have guardrails

I narrowed the code in the end. Instead of testing the full API flow, I isolated the problematic function and tested it on its own. What was invisible inside the integration test became obvious once I isolated it. Then I built a separate path that didn't touch the existing code. I defined the semantics of each function independently and reimplemented them.

The key was the "Must NOT Have" guardrail. "Don't modify this file. Don't change the API response contract. Don't modify existing integration tests." Those three prohibitions broke the agent's A↔B loop.

This experience maps almost exactly onto Factor 9 of Dex Horthy's 12-Factor Agents: compress errors into the context so the agent can self-heal. Not just "try again," but inject the cause and the surrounding facts so the same mistake doesn't repeat.

Don't retry with the same prompt

Most agent loops, on failure, retry with the same prompt. "Try again." That works sometimes. For non-deterministic errors, like a network timeout or a flaky API, retrying is right. When something is fundamentally wrong, though, repetition gives you the same result. The agent is using the wrong library, or it has misread the requirement, or it doesn't have enough context. Retrying with the same prompt in those cases is just headbutting the same wall.

The point is to analyze the cause of the failure and prescribe accordingly. Not repeat the same instruction, but write a better one. That difference is enormous.

Sorting failures into three types makes the prescription clear.

Type 1: Context shortfall. The agent doesn't know something it needs to know. Fix: add the missing information.

Type 2: Direction error. The requirement itself was misread. Fix: redefine the requirement more clearly.

Type 3: Structural conflict. There's a problem in the code structure itself. Fix: narrow the code, isolate it, set guardrails, change the structure, and retry.

The redistribution engine was Type 3. Not "try again," but "isolate this file for tests, and don't touch this file." The structural prescription. Just by asking "what type is this?" before you press "try again," recovery speeds up noticeably. It's faster to figure out why the agent failed and adjust the instruction. Understanding why the agent failed is itself engineering.

How to practice this

The starting habit is logging every failure, even briefly. "Missed the context." "Misread the requirement." "Fell into A↔B loop." A pile of those short notes starts to show patterns. When the same type repeats three times, that's the signal to change the system.

Stay open to new tools and methods. I've moved from Cursor to Claude Code, from Claude Code to Codex, and through OpenClaw, Superpowers, and several skill systems along the way. Each tool had its own failure pattern, and crossing between them is what built up my "feel for working with agents." Don't get attached to any one tool. Tools are means, not ends.

Per-project KNOWN_ISSUES.md files are also effective. Keep a list of "the mistakes the agent makes most often on this project" and the recurrence rate clearly drops. Failure logs become memory, and memory becomes a system.

When you try a new approach, use the "30-minute rule." If there's no meaningful progress in 30 minutes, find another way. If something works inside 30 minutes, dig deeper from there. Failing is fine. Repeating the same failure is not.

⑤ Observability

Handing a big task wholesale to an agent is convenient. When something goes wrong, though, it's hard to figure out where. "At what point will I check the result?" That question is the heart of observability.

In Karpathy's DGX Spark example the agent worked autonomously for thirty minutes. The post doesn't say what Karpathy did during that thirty minutes, but the fact that the agent left "memory notes and a markdown report" means the work history was traceable.

The stronger models and agents get, the more observability matters. The more an agent can do, the more directions things can go wrong in.

Before: liquidglass, the cost of "weird, but let's leave it"

When iOS 26 was announced, I tried to apply liquidglass for the first time. I wanted to bring the new design language into our app. I expected the agent to handle the update on its own. (That this expectation was naive, you should be seeing the pattern by now.)

I watched the agent work. The first few files looked fine. Around the fourth or fifth file, something felt off. The scope of files it was touching was wider than expected. Colors looked like they were drifting from the original intent. The branches for backward compatibility were getting more tangled.

"Weird, but let's leave it." That single sentence was the most expensive call I made.

When I checked the result, the UI was completely broken. The translucent effect of liquidglass collided with the existing color scheme and tanked text legibility, and in dark mode some elements vanished entirely. The worst part was there were no per-step commits. I couldn't roll back partially. All in or all out.

If I'd stopped at the fourth or fifth file and checked, in the worst case I'd have rolled back five files. Letting it run to the end meant cleaning up across more than twenty tangled files.

After: tracer-bullet strategy plus a blueprint

After this, when I apply a new technology I always use a tracer-bullet strategy. Instead of applying it everywhere at once, I apply it to the simplest screen first. Fire small, check fast. If it's fine, expand to the next screen.

The real value of the tracer bullet is that it produces a blueprint. Applying liquidglass to one screen showed me, "ah, this is where it collides with the color scheme, this is how the dark mode branch needs to be done." For a technology you've never used, you can't draw the blueprint up front. The tracer bullet draws it for you, fast. With the blueprint from screen A in hand, when the agent started touching unexpected files on screen B I could immediately judge "this isn't what I expected."

Per-step commits became mandatory too. "Apply screen A" → commit → "Apply screen B" → commit. Now if screen C breaks, I have rollback points. "Commit every three files modified." It's a one-liner instruction. That single line drops the cost of fixes dramatically.

As observability goes up, the scope you can delegate goes up too. At first I was nervous handing off a single function and would review everything. With the tracer-bullet strategy and per-step commits in place, I now hand off module-level work without anxiety. Observability builds trust, and trust enables delegation. My answer to "do you trust your agents?" is shifting toward "more and more, yes," not because the agents got smarter, but because my observation system got more refined.

How to practice this

First, build a feel for splitting work into the right size. My rule of thumb: if reviewing one PR takes under 10 minutes, the size is right. Over 30 minutes is too big. By file count, 3 to 5 is a good range to check at once. I wasn't sure about this rule at first, but after a few months "this is too big" started landing on its own.

Designing explicit mid-run checkpoints needs to become a habit too. "Show me when you get this far." That sentence prevents an hour-long detour. Auto-reporting is even better. I have my agent report a diff summary every three files modified. Instead of looking at the full code each time, I read the summary and decide "direction OK" or "stop."

Then there's the habit of sketching, before you start, a rough blueprint of "this is roughly how it'll go." That's the precondition for observability. If I don't know where the agent is supposed to go, I can't tell when it's gone off. For a refactor, "I'll touch these files in this order." For a new feature, "this module will end up with this shape." That level of sketch is enough.

The blueprint doesn't have to be accurate. There are times when the agent's different approach is better than mine. What matters is noticing "it's going in a different direction." Even when the agent goes its own way, with a blueprint I can immediately catch "wait, this is different." Without one, catching it isn't even possible.

Elvis's 10-minute cron monitoring is the automation of this blueprint comparison. It compares the agent's current state (tmux session alive, PR status, CI result) against a pre-defined expected state. When something deviates, an alert fires and a human steps in. It's a 100% deterministic bash script, so it costs no tokens and can't really be wrong. Simple principle. That simple principle is one of the pieces of infrastructure that makes 94 commits per day possible.

⑥ Memory Architecture

If you do long work with AI you'll hit a wall every time. As the session stretches, it forgets what was said earlier. There's a name for it, context compaction. Context shrinks too aggressively, and continuous work suffers most.

Karpathy's agent instructions always ended with "memory notes and a markdown report." The point isn't just writing code. The point is leaving a record of what was done.

An orchestrator without memory treats every session like a first meeting. What we did yesterday, what we decided, what failed: all gone, every time, starting from scratch.

Before: 15 minutes every morning re-explaining context

When I was three days deep into an auth refactor, every morning I'd open with "yesterday I changed the JWT structure," and it wore me down. The dev.to post by @suede describes the same situation exactly. Continuous work, but every new session in the morning meant explaining yesterday's work from the top. "I changed this structure yesterday, let me start by explaining why." That's 15 to 20 minutes gone. Three days in a row, that's almost an hour. And verbal recap isn't perfect. Things I forgot or mis-remembered slip in.

After: hooks for automatic memory, restore in 5 seconds with one MEMORY.md

@suede's solution was elegant. He used Claude Code's hooks feature to build a system that automatically extracts "memories" at the end of each session and writes them to CLAUDE.md.

"Session 1: Claude works → hooks silently extract memories → saved. Session 2: Claude starts → reads CLAUDE.md → instantly knows everything."

The point is that "you don't need to tell it to record." Hooks summarize and append the work content automatically at session end. The next session reads it on start. Time to restore context: 5 seconds. From 15 minutes to 5 seconds. Once you feel that gap, there's no going back.

I haven't gone as far as hooks, but borrowing the pattern, every turn of work in Codex or Claude Code I always update the memory and progress doc. In MEMORY.md I write "what I did today, what I decided, what to pick up next."

The Boris Cherny team's case extends this memory to the team level. The Claude Code team checks a single CLAUDE.md into git so the whole team shares it. When Claude does something wrong, they immediately add to CLAUDE.md: "next time, don't do this." Even in code review they tag @.claude and update it as part of the PR. Individual memory becomes team memory passed to the agent.

Tools are pouring out in this direction now. Claude Code's built-in memory, AI memory layers like supermemory.ai. As memory infrastructure matures, the underlying problem of "every session is a first meeting" is heading toward a real solution.

How to practice this

The habit of documenting every turn is, almost by itself, the whole of memory architecture. Make one MEMORY.md and start writing every day. Today's decision and why, what's next, open issues. Those three items are enough.

One tip: keeping the memory's structure consistent matters too. I write MEMORY.md in date order, and tag each entry with [decision], [work], [issue]. Later, when I'm looking for "what was that architecture decision from last month?", searching [decision] returns it inside ten seconds. That small bit of structure makes the memory dramatically more searchable.

When projects get long, bring in a searchable system (Obsidian, etc.). The point is "a searchable record." If you can't find an architecture decision from three months ago, you'll have the same conversation again. Memory breaks that loop.

⑦ Parallel Orchestration

One of Karpathy's key points was this.

"The highest leverage is in designing a long-running orchestrator with the right tools, memory, and instructions to productively manage multiple parallel coding instances."

"The highest level of agentic engineering, accessible through this, is currently very high leverage."

Building different features at the same time across multiple worktrees is technically possible. In practice, the management is rough. Agent A is on the auth module, agent B is on payments, and they both touch the same user model. Collision.

From Boris Cherny earlier to Elvis (@elvissun) at 94 commits a day, the direction is the same. A single engineer orchestrating multiple agents to produce team-level output. That's exactly why Karpathy named this "agentic engineering."

His example shows the extreme of this direction. Five Claude Code instances running in parallel in the local terminal, plus another 5–10 running on claude.ai/code. 10 to 15 parallel sessions in total. The structure works because each agent's context is rigorously isolated. Tools like Superset.sh and oh-my-codex (omx) are emerging in the same direction.

Echoes of my CTO years

Going through this kept reminding me of my CTO years. The years I managed six squads. Daily meetings with six teams, getting a read on each team's state, unblocking blockers, keeping the overall direction from drifting. Managing parallel agents resembles that work to a startling degree.

I've seen people compare current parallel agent coding to ADHD. Switching between many tasks and not being able to focus on any of them. There's something to that. I think it's closer to managing, though. ADHD is unintended distraction. Agent management is intentional multitasking. What a manager needs isn't "the ability to write the code for every team" but "the ability to read every team's state, unblock blockers, and align direction." Parallel agent management is exactly that.

In ⑤, leaving one agent alone was already risky. Five agents running together doubles that risk. When I managed six squads, the most dangerous moment was the one where I let myself think "everything is going fine." In that moment one team is flailing, two teams are duplicating each other's work, or someone is sprinting in the wrong direction. Same with agents. Leave five agents running in parallel with a "they'll figure it out" mindset and merge time becomes a collision festival, one agent overwrites another's work, or the outputs end up wildly inconsistent.

Checklists and sync points are the lifeline. And this isn't a new skill. Good managers already have it. The agent era just put a new name on it.

There's one decisive difference between managing people and managing agents. People ask questions. Agents don't ask. They proceed on their own judgment. That's why design up front matters more in agent management. "In situations like X, do Y." You have to set that ahead of time.

How to practice this

Start small. Running five agents in parallel from day one ends in chaos. Start with two.

Sharing my own experience: the first day I ran two agents in parallel was a mess. While I checked A's results I missed B's progress, and when I went to check B, A was waiting on me. From day two I started using a timer. 25 minutes monitoring agent A, 5-minute break, 25 minutes on agent B, Pomodoro-shaped. Once that routine settled, both agents ran stably. A week in I added a third. Two stable, then three; three stable, then five. People who've managed teams or led squads will move through this faster.

You also have to map out dependencies between parallel work and design for collision avoidance. Using git worktree gives you physical separation. Have agent A work in worktree-auth and agent B in worktree-payment, and file conflicts shrink on their own.

⑧ Abstraction Layering

There are levels in agentic engineering, in my view. I distinguish them by feel.

Level 0: Direct coding (typing line by line in the editor)
Level 1: Instructing an agent (asking Claude Code or Codex to do work)
Level 2: Orchestrator (designing the system that manages multiple agents)
Level 3: Meta design (building the tools that make orchestrators)

I'm currently at Level 2 and trying for Level 3. I'm building skills, automating workflows, and experimenting with structures where agents manage agents.

Before: the days of repeating the same instruction every time

In my Level 1 days, I manually repeated the same routine every morning. "Check yesterday's merged PRs" → "summarize the changes" → "list open issues" → "propose priorities." All four, in order, every time. Twenty minutes a day. Seven hours a month. It took me about three weeks to notice that the instructions were almost identical every time.

After: one skill, "summarize this week"

I turned that routine into a skill. One sentence runs it: "summarize this week." A 20-minute routine became 2 minutes. Beyond the time savings, there was a bigger change. Building this skill forced me to make explicit "the pattern of judgments I make every day." The process itself was practice in raising the abstraction layer.

There was one thing I felt every time I built a skill. People call it compounding engineering. Our projects are big enough that they don't end in a single session. This isn't a finish-line game. It's a compounding game where earlier sessions affect later ones with interest.

"The biggest payoff is in raising the abstraction layer ever higher."

The "edge" Karpathy named isn't just time savings. Each level up widens the field of view, so you can take on bigger problems. From the era of writing code by hand (Level 0), to the era of instructing an agent in English (Level 1–2), to the era of designing an orchestrator that manages agents (Level 2–3). Every level up dramatically broadens what a human can take on.

When abstraction goes up, the human's role changes

It's not that the human idles while the agent works. Instead of typing code, you design the system. Instead of instructing the agent, you build the environment in which the agent works well.

The hours that used to go into typing code now go into setting direction, making judgments, and supervising quality. That's the practical meaning of raising the abstraction layer.

How to practice this

"I'm giving this same instruction for the third time." That awareness is where abstraction begins. When you see repetition, turn it into a skill or a template. A simple prompt template is fine to start. That one small piece of automation becomes the base for the next one.

Make a habit of asking, "what would I need to delegate this to an agent?" That question itself is the start of abstract thinking. Look at every task you do by hand through "is this delegable?" If it is, what context, tools, and memory would the agent need? Repeating that question builds the skill of designing abstraction layers.

⑨ Taste

The last one is the hardest to measure and maybe the most important.

"Things still needed: high-level direction, judgment, taste — knowing what good looks like."

The sense of looking at what an agent built and telling "this one's solid" from "this one's off." It works technically, but somehow it's uncomfortable. The code runs, but somehow it doesn't feel right. You can feel it. You actually have to feel it.

"'Engineering' because there is art, science, and skill to it."

Art, science, skill: taste sits where these three overlap. It isn't innate. It's something you accumulate by going deep.

A prototype the AI made, my partner's reaction

There was an episode with my current partner, Ellie, a product designer I work with on building the app fast. When I made screen A with AI and showed it to her, she was put off at first. The output landed without discussion, and she felt like she didn't know what her role was. (Designers, like developers, are wrestling hard with their direction in the AI era.)

After enough conversation, when I delivered screen B the same way, it was different. By then she understood the direction I was going for, and with a concrete working prototype as the reference, what was missing and what needed more polish became visible. Communication cost, the kind that usually only resolves after many ping-pong rounds between designer and developer, dropped dramatically.

AI design is bland

The same thing happens in our current project. Our app isn't just a generic productivity app, but the AI kept generating only the boilerplate productivity-app design. Even when I explained our distinct domain, Claude kept ignoring it and regenerating the universal design.

What I'd handed over thinking, "this is intuitive enough," was, honestly, a 60 to 70. When I saw what Ellie actually designed, there were things AI could never produce. Looking at the AI output I was uncertain. When Ellie's design landed, the feeling came: "ah, this works."

Most of what AI produces is average. There's real value in laying down the skeleton and the components. Taste, texture, the specific touch: that's still the human's territory.

Do work → Good → Great

AI brings remarkable performance gains. What it actually reaches today, though, is, honestly, around 80%. 80% is amazing compared to the past.

The problem is the remaining 20%. Each 1% within that 20% is a bigger gap than the previous 10%. Look at a product, a restaurant, a film. The moment when the extra 2% really went in is the moment that moves you. The feeling you get from a master, a virtuoso, a great director sits outside the band of "average."

When 80% products flood the market → people will go looking for the better thing in the remaining 20% → and that 20% becomes the differentiator: human skill and craft.

I had a similar experience with social media. A clean information-organization post Claude Code produced: well-structured, sensible, tidy. Zero likes. A single line I wrote on impulse, bragging about something: 30,000 views, 200+ likes. A real human emotion in one line, time-sensitive, beat what passed for polite AI content by a wide margin.

LLMs are statistical models in the end. The word "model" itself means "an approximation of the real world." What an LLM has learned is the patterns of text on the internet. The average of "good design," the average of "good code." Average is safe. It isn't outstanding. Outstanding comes from leaving the average behind.

Don't lose your intuition.

Sean Goedecke puts the point exactly:

"About once an hour I notice that the agent is doing something that looks suspicious, and when I dig deeper I'm able to set it on the right track and save hours of wasted effort... This is why I think pure 'vibe coding' hasn't produced an explosion of useful apps."

That "ability to notice something suspicious" is taste. When the agent decides to spin up a full background-job infrastructure for what should have been a simple async request, the call to stop and say "wait, this is overengineering" is taste. Structural judgment is taste.

"Works" and "great" sit on different axes

This is the thing I most wanted to say in this post. Do work → Good → Great. The gap between those three.

AI does "Do work" remarkably fast. In some cases it gets to "Good." The last 20% to "Great" is territory you can't reach if you settle for the AI average of 80%. Customers feel the final 2%. No one is moved by an average output.

If everything is getting easier with AI, suspect it. Ask whether your output is settling at the average. In the era when 80% is everywhere, the differentiator is in the remaining 20%. That 20% is the territory of taste, not of technique.

KinglyCrow's "No Skill, No Taste" is sharp on this point. Taste and skill are a 2×2 matrix. LLMs look like they've lowered the entry barrier on skill, but the real barrier of taste is unchanged. If anything, it's been amplified. Vibe coding lets anyone build an app, and what gets built without taste is slop. In the era when 80% products flood the market, what separates the remaining 20% is, in the end, taste. No matter how far AI advances, building that sense is still on me.

Chris Lattner, who built LLVM and Swift, reached the same conclusion. When Anthropic released the project where Claude Code implements a C compiler from scratch (CCC), Lattner's analysis on his blog was that the implementation is textbook and there's no new abstraction. He compared it to the level of a strong undergraduate team. What he actually highlighted was elsewhere. "As implementation gets more automated, design, judgment, and taste become more important, not less." The more AI lowers the implementation barrier, the more the taste of what to build becomes the engineer's core competency.

Taste is accumulated experience

This sense comes from domain knowledge. Someone who has used many good APIs can design good APIs. Someone who has experienced many good UXes can judge good UX. No matter how fast AI builds, judging "is this good or not" is on me.

After 15 years of writing code, I know in my bones the difference between "this is good code" and "this works but isn't good code." That difference shows up in a single variable name, a single function structure, a single error-handling style. The same standard has to apply to code an agent wrote. "Works" and "good" sit on different axes.

The agent once built me a search feature that worked perfectly. Technically nothing to fault. Something was off, though. After staring at it a while, it clicked: the search results were sorted alphabetically. Technically correct, but from the user's perspective, sorting by relevance is far more natural. The agent built "the search feature." It hadn't built "a good search experience." Catching that gap is taste.

How to practice this

The clearest way to build taste is to see, make, and use a lot of good work. Don't read only tech blogs. Look at design, study business cases, read fiction. Go to museums.

The starting point for taste is the habit of not accepting the agent's output as-is. Ask, every time, "is this really the best?" "Why is this good?" "Why does this feel uncomfortable?" Repeating those questions sharpens your sense.

Care about people, too. Watch what customers want and where users get stuck. A product that's technically perfect but uncomfortable to use is a product where taste is missing. Whether it's running user interviews, watching the support channel, or peering over your neighbor's shoulder while they use the app, taste sharpens at the point where humans meet technology.

Taste is hard to grow alone. Reviewing other people's code, watching users react, listening to a partner's feedback. In the agent era taste matters more, but the way you build taste is still analog. Talking with people, watching the world, experiencing good things. AI can't do that part for you.

Closing

"Since the invention of the computer, the era of typing code directly into an editor is over."

True. What's over is the typing, not the engineering.

Decomposition, context architecture, definition of done, failure recovery, observability, memory architecture, parallel orchestration, abstraction layering, taste. Sit with these nine and you'll see they were already what good engineers had before the AI era. Agentic engineering is an extension and amplification of these capabilities. Nothing new. The things that were already important got more important.

If one thing has shifted, it's that the effect of these capabilities has been dramatically amplified. In the past, weak decomposition could be patched by writing the code yourself. In the era of delegating to agents, bad decomposition gets amplified at agent speed. The payoff of good design grew, and so did the damage from bad design.

Mihail Eric, who teaches AI-native engineering at Stanford, gives practical advice: add incrementally. Get really good at one agent workflow first. When you can build complex software with one agent, then add the second. One step at a time, not ten at once.

Mihail also pointed out something important. Watching the people who handle multi-agent setups well, they were people with actual experience managing human developers. My CTO years managing six squads helped directly with managing agents in the same way.

I still have a long way to go. Some days the rhythm with the agent is so on-point that I think "this is the future." The next day I'm watching the agent flail and grumbling that it'd be faster to write it myself.

The direction is clear, though. And that direction isn't "write better prompts." It's "design the environment in which the agent works well." Prompts are the tool. Environment design is the substance.

In the end this is a question of taste and experience. Tools change. The substance stays. A good engineer who meets an agent becomes a great engineer. Bad design that meets an agent produces bad output, fast.

These nine capabilities aren't separate. They're connected. Good decomposition makes the definition of done clear. Good context architecture makes failure recovery easier. Accumulated memory raises observability. Experience with parallel management lifts the abstraction layer. Underneath all of it sits taste. Build one and the others follow. It doesn't matter where you start. What matters is starting.

As Mihail emphasized to his students, experimentation is the core of becoming an AI-native software developer. In the end you have to bang your own head against the wall a few times. Everything I shared in this post (half a day of ping-pong on AddPlan, the hollow CLI, the redistribution engine's infinite loop, the "weird, but let's leave it" liquidglass) is the result of trial and error. Without that, none of these nine capabilities settle into your hands.

"It is a deep, improvable skill."

A little better every day is enough. No need for perfect. Just the right direction.

If you'd told me six months ago, "your AI agent will write code overnight and you'll just review the PR in the morning," I'd have laughed. Now it's daily life. I can't picture what daily life looks like six months from now. One thing I'm sure of, though: even then, decomposition will still be needed, context architecture will still matter, and taste will still be irreplaceable.

The protagonist of that story isn't the AI. It's the engineer who handles the AI well.

References

AI Is Only as Smart as You Are

Tony Cho (https://flowkater.io) — Sat, 28 Feb 2026 06:00:00 GMT

AI Is Only as Smart as You Are

I posted "What Should Engineers Read in an Era That No Longer Reads Code?" and more people read it than I expected. There were threads from that piece I never got to finish, so I want to pick them back up here. This is for the engineers who are still asking how to grow even with AI in the picture, written from a few things I have been through recently and a handful of cases I keep coming back to.

Same AI, Different Results

Stripe rolled out Cursor to 3,000 engineers. Scott MacVicar, who runs developer infrastructure, expected juniors to gain the most. The reasoning was reasonable enough: AI would fill in the gaps that experience had not closed yet. The result went the other way.

"He expected junior engineers to benefit most, using AI to compensate for limited experience. Instead, he's seen the [tenure advantage] — more experienced engineers get even more value."

— Scott MacVicar, Head of Developer Infrastructure, Stripe

Around the same time, a study out of MIT Media Lab landed and made the picture more interesting. The researchers measured brain activity directly with EEG, and the more people relied on AI, the more their neural connections systematically weakened. The group using LLMs could not even cite their own writing properly. Across four months, every metric came in worse than the brain-only group.

"Brain connectivity systematically scaled down with the amount of external support."

There is a counterpoint, though. A GitHub Copilot study reported the opposite finding. Developers with less programming experience saw bigger productivity gains from Copilot.

"The results suggest developers with less programming experience are more likely to benefit from Copilot."

So why does Stripe's data point the other way? Because the two studies measure different things. The Copilot research measured productivity, how fast you finish code. Stripe was looking at value, how meaningful the output is. Juniors can speed up with AI. Speed and value sit on different axes, though. Sprinting in the wrong direction is not value.

Putting the two data points side by side gives you an uncomfortable picture. AI is making the strong stronger and the weak weaker. Same tool, but the gap keeps growing. Why?

In the previous post I wrote that "AI is a mirror." The point is to become someone with something worth reflecting. After publishing it, though, I kept turning over the same question. How exactly does that "something worth reflecting" get built? Was it about writing better prompts? It was not.

No Input, No Output

Honest confession: I went through a phase of being obsessed with prompt engineering. I collected prompt templates and polished structured instructions, convinced this was how you got more out of AI. (It is a little embarrassing in retrospect.)

Looking back, the moment I actually got good with AI was not the moment I got better at prompts. It was the moment the depth of my domain crossed a certain threshold. I had been doing TDD for over ten years, so I could tell the AI to write the tests first. I had spent enough time with DDD that I could ask it to define the bounded contexts up front. The prompts were not great. What was great was the context that had built up in my head, which then flowed naturally into good instructions.

I felt this down to the bone recently. While working on a UI server integration, I iterated on the AI's plan more than twenty times. The plan document itself was solid. Once I ran it, though, the actual values coming through were wrong. No errors fired either. Fallback values were quietly being returned. The system looked like it was running fine, but the values had nothing to do with the intent behind the domain requirements.

As a service's business logic gets more complicated, AI loses the thread and starts wandering. Narrowing the problem, narrowing the error surface, slicing the work into pieces the AI can actually solve — all of that is still on the human. So is verifying the result, writing the tests, doing the careful checking. The AI built it according to the plan. Whether that plan matched the requirements was something I had to confirm with my own hands.

AI has come a long way, and the productivity gains are real. The completeness and quality of the final product still come down to domain expertise and attention to detail, the same skills that mattered before AI showed up.

There is a more uncomfortable fact behind this. When your input is thin, AI does not tell you so. It tells you that you are doing great.

A joint Stanford and Carnegie Mellon team tested 11 AI systems, including ChatGPT, Claude, and Gemini, and published the results in Science. AI affirmed users' behavior about 49 percent more often on average than humans did. Even when the team fed it cases from Reddit's r/AmITheAsshole where the community had decided the user was in the wrong, the chatbots took the user's side 51 percent of the time. They sided with users 47 percent of the time on queries about harmful or illegal behavior.

A follow-up experiment with 2,405 participants was even more striking. People who talked with sycophantic AI grew more confident they were right and less willing to change their behavior. Then came the most unsettling finding: people trusted the sycophantic AI more. They rated it as "objective and fair." They preferred the sweet AI to the one that pushed back.

"Users know that AI behaves in flattering and complimentary ways. What they don't know, and what surprised us, is that the sycophancy is making them more self-centered and more morally rigid."

— Dan Jurafsky, Professor of Linguistics and Computer Science, Stanford

Layer the MIT result over that one and the loop becomes obvious. Heavy AI use weakens the brain's connectivity (MIT). The weakened brain hears AI whisper that it is doing great (Stanford). And the person who heard that whisper leans on AI even more. AI is not just as smart as you. It is a mirror that says back what you wanted to hear.

The most dangerous person in the AI era, in the end, is not the one who can't use AI. It is the one who accepts AI's flattery without checking it.

In an earlier post I told junior engineers, "Read a lot of books and put your own thoughts in order. That is literacy, and it is the most important edge in the AI era." That advice was not just for juniors, though. It is a note to all of us, myself included, on how to survive this era.

I have been making more time for books lately, the non-technical kind. Foreign articles and English-language originals that I used to put off because they took too long to get through, I am now getting through several times faster thanks to Jarvis. Writing blog posts also helps a lot with how I work with AI, because it forces me to keep practicing the muscle of reading something as a consumer, digesting it, and weaving it into my own thinking. (That was the point of the three-pass method post, too. If you check the box on "I read it" and move on, a month later you have nothing.)

Wharton's Ethan Mollick ran an interesting experiment. He had Executive MBA students build prototypes with AI, and people with zero coding background finished them in four days. So what was the key to their success?

"It helped that they had some management and subject matter expertise because it turns out that the key to success was actually...telling the AI what you want."

— Ethan Mollick

Knowing "what you want" is itself expertise. Memorizing prompt templates is not the move. Someone who understands the domain naturally gives good instructions. The order is reversed.

What it comes down to is this: getting better with AI is not about studying AI. It is about growing the depth of your own domain. The quality of your input sets the quality of your output. As Simon Willison put it, the cost of writing new code has dropped close to zero, and the cost of writing good code is still high. That cost shows up not in the prompt but in the thickness of the person.

The post No Skill. No Taste makes a point I enjoyed. Most of the vibe-coded apps that developers and non-developers are shipping right now sit in the bottom quadrant of the Skill and Taste matrix. (Productivity-app-shaped Todo apps, for example.) The same Todo App with sharper design and finish that goes past what people expect, on the other hand, can become a hit even though it is just a CRUD app. AI can replace Skill, but Taste, the sense and judgment for the domain, still comes from a person. (Developers, look up from the screen for a second!)

Don't Ask AI for the Answer

Picture yourself implementing a new feature. Most people will tell the AI, "Build this feature for me." They feed it keywords and wait for an answer. An LLM is not a search engine, though. It is a conversation partner.

Before any work, I always run a skill called interview. Whether the task is a new feature or an architecture design, no matter how carefully I prepared the doc beforehand, the AI does not jump into code. It asks me questions instead. "What is the core user scenario for this feature?" "How do you want to draw the boundary with the existing modules?" "What's the fallback strategy on failure?" "Are there performance requirements?" Through these questions, it pulls out edge cases and design decisions I would have missed.

I felt the value of this interview clearly while working on a redistribution engine recently. The engine handles two types, and the policies for each type differ. I had only thought about Type A when I updated the spec. I wrote it carefully, even ran it past AI for feedback. When I actually ran it, Type B got overridden along with Type A and I spent ages chasing the bug. The interview skill catches things like this. It asks me first: "How is Type B handled?"

The interview is a process where "AI forces every ambiguity to be resolved." It gets more useful the more complex the feature is. When implementing a complicated engine algorithm or pinning down UI details, putting every gap and edge case into the doc before execution gives a one-shot result that is in a different league from before.

Why is this pattern so strong? If you think about it, this is exactly what good mentors do. They don't hand you the answer. They expand your thinking with questions like, "Why do you think that?" "What if that assumption is wrong?" "Have you got a concrete example?"

Jeremy Utley recommends a prompt I like.

"You are an AI expert. Until you have enough context on my workflow, scope of responsibility, KPIs, and goals, ask me one question at a time."

Don't ask AI questions. Make AI ask the questions. I think this flip is the single biggest payoff in working with AI.

The attitude from pandas creator Wes McKinney is in the same vein.

"I don't describe the way I work now as 'vibe coding'—I've been building tools to bring rigor and continuous supervision to my parallel agent sessions, and to heavily scrutinize the work that my agents are doing."

Thick-context people don't "use" AI, they "manage" it. The point is not asking AI for answers, but using AI to test and structure your own thinking. That is how the same tool produces different results.

The Gap Is Widening

So far I have been talking about individuals, so let me widen the lens to the org level.

Stripe's Minions system is a good case. Beyond rolling out Cursor to 3,000 engineers, they built a system that auto-generates more than 1,000 PRs a week. Every PR is reviewed by a human, though.

"Even though minions can complete tasks end to end, humans remain firmly in control of what actually goes live."

And in those human reviews, the senior engineers created the bigger value. The speed at which AI generated code was the same for everyone. The ability to evaluate that code and steer it scaled with experience.

McKinsey's survey of 2,000 organizations shows the same pattern. 80 percent had adopted AI, but only 5 percent were creating real value. BCG's analysis of 1,250 companies came out at almost the same number.

Why didn't the other 95 percent see returns? The answer is straightforward when you think about it. If you bolt AI on top of your existing way of working as a tool, you stay stuck at the same bottlenecks. AI does unblock the bottlenecks where code-writing was the limit. If the real bottleneck was in your decision-making structure or your culture, AI does nothing for it.

"High performers are nearly 3x more likely to have fundamentally redesigned workflows as part of their AI efforts."

What sets the 5 percent apart is not that they adopted AI well. They had reformed their existing practices and systems before adopting AI. The order is reversed.

The F1 Williams piece on James Vowles' leadership I wrote earlier sits in the same lane. What Vowles did at Williams was not introduce a new tool. He tore up the old system, the one that had been managing F1 cars on Excel sheets, starting from culture and process. Reforming the existing system to fit the tool is far harder than installing the tool, and it is worth more in proportion. This is not just a problem of systems and practices, either. It runs into culture and leadership.

When Shopify CEO Tobi Lütke told the whole company that "AI use is the baseline expectation" and required teams to prove "why this can't be done with AI" before any new headcount, that is the same logic. The mandate is not to use the tool. It is to change how the work itself is done.

People are the issue, in the end. Putting AI on top of an existing process is the 95-percent move. The 5 percent redesign the process, the culture, and the way people work.

Are you "adding" AI to how you already work, or are you redesigning how you work?

Closing

The tools are the same for everyone. What sets the difference is the thickness of the person standing in front of them.

Doing the harness engineering, building the workflow, plugging it in, that feels like it puts you ahead of the pack. Whenever a new tool hits the news, all of us race to catch up out of FOMO. At some point, though, engineering workflows level off and become standard issue. The question is what we should be preparing for so that we are still here as engineers when that moment comes. My take is that the fundamentals do not change.

There was a time when I believed it was enough to just write code well. I was one of those people. Once AI started writing code on our behalf, what was left was everything that lived outside the code. The depth to understand a domain, the care to verify requirements, the experience to anticipate edge cases, the judgment to set direction. The things that mattered before AI matter more now.

"If you've never articulated what makes your work yours, AI will give you average. But if you've done the work to know yourself as a creative? AI becomes an extension of your voice, not a replacement for it."

— Jeremy Utley, Stanford d.school

AI does not replace your voice. It amplifies it. Do you have a voice worth amplifying? That is the question.

References

In an Era That Doesn't Read Code, What Should an Engineer Read?

Tony Cho (https://flowkater.io) — Thu, 19 Feb 2026 03:00:00 GMT

Developers Who Used AI Learned 17 Percent Less

The more you use AI, the dumber you get. That's roughly what one piece of research suggests. In a paper from Anthropic, developers who completed a coding task with AI assistance scored 17 percent lower on a quiz than those who worked without AI. The experiment was about learning a new library, and the AI-assisted group finished the task but never really understood the concepts behind that library.

When I first saw the number, I wasn't uncomfortable so much as curious — and intrigued. The fact that it came straight from the team behind Claude Code was interesting on its own, and I wanted to see what kind of experiment led them to that conclusion.

You do have to read this study with some context, though. The experiment was a short 35-minute coding task, and the model used was GPT-4o, which by today's standards already feels like the previous generation. The setup was also far from real work. Using AI on a project that runs for months in a real job is a completely different situation from finishing a task with a brand-new library inside 35 minutes.

But the deeper point of the research lies somewhere else. What the team actually found was not simply "AI use reduces learning." It was that the same AI produced wildly different outcomes from one person to the next. Some people offloaded the entire code to AI and copy-pasted, while others asked AI only about the concepts and wrote the code themselves. The first group finished fastest but learned the least. The second group ran into more errors but scored far higher on the quiz.

Used the wrong way, we may stop growing.

That, I think, is what this research is really trying to say.

From Reading Code to Directing It

While I was sitting with these results, I read an interesting piece. An engineer named Ben Shoemaker wrote "In Defense of Not Reading the Code," and the title alone was provocative. It set off a heated debate on Hacker News with more than 200 comments.

His core point goes like this. "I don't read code line by line anymore. Instead, I read specs, I read tests, I read architecture." The way he checks correctness has changed. He writes specs first, tags each requirement with a verification method, layers automated tests, linters, and security scans into a harness, and then leaves code generation to an AI agent. Instead of code review, he proposed a new approach he calls harness engineering.

Looking back, I had been moving in the same direction. On a recent project where I leaned on AI for code, the things I poured the most effort into were not the code itself but the test harness, context files like AGENTS.md, custom commands, and skill definitions. I wrote about this process in How a 15-year CTO Vibe-Codes. Instead of reading every line, I had naturally shifted toward checking whether the tests passed and whether the architectural constraints were honored. Reading Ben's piece, I felt a small relief: this wasn't just me.

The interesting part is that around the same time, OpenAI was saying something similar. Three engineers, with Codex agents alone, produced a million lines of code and shipped a product that hundreds of internal users now depend on, and what they invested in wasn't code quality but the harness around the code: documentation, dependency rules, linters, test infrastructure, observability. What they did not invest in was reading code line by line. Watching this, I felt the direction was set. We're moving from reading code to reading the environment that makes the code come out right.

So what does this actually change? Evan Armstrong frames the shift in a bigger picture. In his words, code itself is becoming a commodity. Commoditized here means code generation is no longer a scarce specialist skill but a general resource anyone can reach. A large share of GitHub commits are already AI-generated, and that share is growing fast. Generating code has been commoditized, but governing code in production (knowing what should exist, what data it connects to, who is allowed to change it) has not. He calls this the context layer. It's organizational tacit knowledge becoming software. It's the kind of organizational knowledge that tells the agent what to do, in what order, and whether it's allowed. Building software is no longer the hard part. Telling the system what to build is.

I feel this in my own work. When I work with AI these days, the hardest thing isn't writing the code, it's defining clearly "what we should build." When the spec is fuzzy, no matter how smart the AI is, the result drifts off. The quality of instruction sets the quality of output.

The Codex deep-dive showed the same pattern. Engineers on the Codex team have effectively become agent managers. In one tab, a code review runs; in another, a feature is being implemented; in a third, a security audit is in progress. They manage four to eight parallel agents at once, and they use a file called AGENTS.md to teach each agent how to find its way around the codebase, what test commands to run, and what the project's standards are. If the README is for humans, AGENTS.md is for AI.

And Steve Yegge gave this whole movement its bluntest name. As a 40-year veteran software engineer, he declared that "the era of hand-coding is over" and laid out an eight-level scale of AI adoption. Level 1 is not using AI at all. From Level 4, you stop reading the diff. At Level 6, you spin up multiple agents. At Level 8, you build your own orchestrator to coordinate them. In his words, when he sees someone open the IDE, carefully review the code, and then check it in, he feels sad for them. They're some of the best engineers he knows, but that way of working will leave them behind.

If I'm honest, I tried to place myself on the scale. Somewhere between Level 6 and 7. I don't completely skip the diff, but for anything that isn't core logic, I judge by whether the tests pass. Six months ago I needed to verify every line myself before I could relax. These days I more often trust the harness and move on. My field of view has shifted from reading every line to validating the system as a whole.

Yegge's framing is provocative, but looking back, I had already been moving in that direction. From reading and writing code by hand, to defining specs, directing agents, and verifying results. The role of the engineer is clearly shifting.

But Is Writing Specs Well Enough? Finish-Line Game vs. Compounding Game

Here's where we have to push one step further. All of this (defining specs, building harnesses, writing AGENTS.md carefully, handing things to agents) sits on top of a hidden assumption. The assumption is that "if the spec is good, the result will be good." When you stop and look at it, that's a fairly risky assumption.

Kent Beck calls this The Finish Line Game. You need software that does X, you reach X, and you're done. Spec-driven development hides exactly this assumption inside it: that we are playing a finish-line game.

Are we, though? What we usually play is The Compounding Game. The first thing you build becomes a resource for the next thing, and that next thing becomes a resource for the one after. The product keeps evolving, the codebase keeps stacking up, and today's architectural decision opens or closes possibilities six months out. Unless it's a one-off script, software development is fundamentally a compounding game.

That distinction landed hard for me. I've recently fallen into the illusion of "going great" while quickly stamping out features with AI. The feature was done, but when I tried to put something on top of it later, the structure couldn't carry it. I'd crossed the finish line, but the compounding wasn't there. A textbook case.

A line from Kent Beck stuck with me: "You can't win the compounding game with a better agent.md file." No matter how carefully you write AGENTS.md, no matter how well you orchestrate agents, at some point system complexity exceeds AI's capacity. At that moment, with so much value still left to earn, the game ends. Sharpening the tools of the finish-line game doesn't change the nature of the compounding game.

Whether it's harness engineering or agent management, the point isn't simply "to build this feature well right now." The point is to design the system so it compounds. To make today's code a resource for tomorrow, and today's architecture the foundation for the next feature. That's the engineer's role, and you can't delegate it to an agent.

AI Is a Mirror

A real question shows up here. Everyone is using the same AI, so why do the results diverge so wildly?

Stanford professor Jeremy Utley, who has taught creativity for 16 years, hits exactly this point.

"AI is a mirror. For the person who wants to be lazy, it will help them be lazy. For the person who wants to be sharper, it will help them be sharper."

That single sentence sums up everything I've experienced with AI.

Let me give my own example. I've practiced TDD (test-driven development) for a long time, and I'm someone who cares about DDD (domain-driven design) and architecture. When I work with AI on code, that background shows up directly. When I tell the AI, "Write the test first. Follow the Red-Green-Refactor cycle," the AI follows the TDD flow. When I say, "Let's define the bounded contexts of this domain first," the AI starts from domain modeling. When I make the architectural decisions first and then ask for an implementation inside those constraints, the quality of the result is visibly different.

What about the opposite? When I just toss out, "Make this feature for me," AI gives me code that runs. The structure, though, is a mess. There are no tests, the error handling is sloppy, and the code shows zero thought for future maintenance. The AI isn't being dumb. It just doesn't care about the parts I didn't care about.

It's a kind of pair programming. AI is at the keyboard and I'm next to it, steering. If I don't know where we're going, AI goes anywhere. AI only finds a real path when I'm able to say, "No, not that way."

Looking at the six AI-use patterns Anthropic found, the picture gets even sharper. The three lowest-scoring patterns (AI delegation, gradual AI dependence, and iterative AI debugging) all shared "cognitive offloading." People handed over the act of thinking itself. They delegated entire code generation, started by asking questions and then handed everything over, or even leaned on AI for debugging. They finished fastest and learned the least.

The three highest-scoring patterns were different: understand-after-generation, hybrid code-and-explanation, and conceptual exploration. The conceptual-exploration pattern in particular stood out. This group asked the AI only about concepts and wrote the code themselves. They ran into more errors, but they solved them on their own, and they were the fastest of the high-scoring patterns. The key detail is that they were the second-fastest overall, right after AI delegation. You can understand and still be fast.

The understand-after-generation pattern was also striking. This group had AI generate the code, then asked follow-up questions about the code. "Why does this part work this way?" "What's the intent of this pattern?" On the surface they look almost identical to the AI-delegation group, but a single step (a check on understanding) split the outcomes completely.

Same tools, same model, same task. Different results. The problem isn't the tool, it's the person using it.

If AI works like a mirror at the individual level, what happens at the org level? A Berkeley study gives an interesting answer. UC Berkeley researchers observed a 200-person tech company for eight months, and once AI made it possible for non-developers to write code, something funny happened. PMs wrote code. Researchers did engineering. The result? Engineers spent more time reviewing and fixing their colleagues' AI code. They spent more time coaching the colleagues who were "vibe coding" and finishing the half-done PRs.

When you think about it, this is the mirror again. AI looks like it's filling in someone's gap, but in practice that gap simply shows up in another form. PMs gained the ability to write code with AI, but judging the quality of that code, and cleaning it up when needed, still came down to engineers who knew code deeply.

One more story from my own experience. The more carefully I write AGENTS.md or the project's context files, the visibly better the AI's output gets. When I explicitly write down the reasons behind the project's architectural decisions, coding conventions, and definitions of domain terms, AI produces remarkably consistent code inside that context. Without that context, AI defaults to internet-average code. With rich context, AI behaves like a member of my team.

In the end, AI is a tool that amplifies what you already have. If I bring a good architectural sense, AI amplifies that. If I have a sense for testing, AI amplifies that too. But it does not give me what I don't already have.

The Mirror's Limit: It Can't Reflect What You Don't Have

There's an uncomfortable truth to face here. If AI is a mirror, there has to be something for it to reflect.

I know TDD, so I can ask the AI to do TDD. I know how to model a domain, so I can ask the AI to define bounded contexts. But the areas I don't know? No matter how good AI gets, if I don't know what I don't know, I can't even ask for it.

Without a deep understanding of security, even if I have AI run a security review, I can't judge whether the result is sufficient. Without a feel for performance optimization, I won't catch the performance issues in the code AI produces. Without knowing database design principles, I can't evaluate whether the schema AI proposes is appropriate.

AI maximizes my strengths, but it leaves my blind spots untouched. It can even get more dangerous. Because AI ships results so quickly, I might race in the wrong direction faster while never realizing the problem sitting in my blind spot.

The "gradual AI dependence" pattern from the Anthropic study is exactly this. People start out trying to understand by asking questions, but as they get more comfortable, they end up delegating everything, and they reach a state where they don't know what they don't know. That's why concept retention failed completely on the second task. The thought "AI will handle it anyway" is the same "AI delegation" pattern that scored lowest in the study. Fastest to finish, least to learn.

In the Berkeley study, when non-developers wrote code with AI, the same thing happened. PMs gained the ability to write code with AI, but with no eye for code quality, engineers had to clean up. AI lowered the bar to writing code, but it did not hand over the ability to judge code quality.

This applies to me too. In backend architecture, where I'm strong, AI becomes a genuinely powerful colleague. In frontend, where I'm weaker, I catch myself unable to evaluate AI's output properly. A React component AI built will "work," but I sometimes can't tell whether the pattern is good or bad.

The Dracula Effect: What AI Drains

The mirror's limit isn't only about blind spots in knowledge. The Dracula effect that Steve Yegge talks about also makes sense in this context. Coding with AI lets you accomplish a huge amount, but it drains your mental energy in proportion. Simon Willison said the same thing: "The productivity boost from LLMs is exhausting. If you run two or three projects in parallel, even one or two hours of work uses up nearly a full day of mental energy." Steve put it more directly. For someone running vibe coding at full speed, you cap out at about three productive hours a day. Even so, that's 100 times more productive than working without AI.

I have a similar experience. With AI, in two or three hours of focused coding I can finish work that used to take days. After that, my head genuinely stops working. The cognitive load is a different kind from the usual one, and it hits harder. When I'm writing code by hand, the act of typing while thinking gives me a natural rest. With AI, I'm constantly judging, verifying, and steering. AI does the producing. The thinking is all on me.

A mirror reflects only the person standing in front of it. With no one in front, it reflects nothing.

So How Do We Use It

If the question is how to use this mirror well, Jeremy Utley's core principle is simple. "Don't demand answers from AI; have a conversation with it." Better yet, don't ask AI questions; let AI ask the questions.

There's a prompt he recommends.

"You are an AI expert. Ask me one question at a time until you have enough context about my workflow, my responsibilities, my KPIs, and my goals."

Why this works: most people use AI like a Google search box. You type a keyword and expect an answer. But an LLM is not a search engine. It's a partner in a conversation. The richer the context I provide, the richer the response.

Jeremy especially emphasizes voice input. Our brains, trained on the Google search box, automatically switch into "keyword mode" when we see an input field. The moment you type, "What should I write first?" creates pressure, and the need to organize your thoughts ends up limiting the possibilities. When you speak, you can ramble. Putting down the burden of needing to be smart is where a real conversation begins.

I tried this myself, and the difference is real. When typing, I first start asking, "How do I phrase the question to get a good answer?" When speaking, "I have this problem, here's the context, and I want to try this" comes out naturally. You shift from keyword mode to conversation mode.

In coding, context engineering is the heart of it. What you're really designing is "what information AI needs in full to do my request properly." Jeremy proposes a simple test. Take your prompt and your documents, and hand them to a colleague across the hall. If that colleague can't do the task, it shouldn't surprise you that AI can't either. AI can't read your mind. What most people realize is, "Oh, I was expecting AI to read my mind."

I felt this when I applied it to my project. When I write down the reasons behind a project's architectural decisions in AGENTS.md, document coding conventions concretely, and build a glossary of domain terms, AI produces remarkably consistent code. When I explain to AI why this project uses event sourcing or why this layer doesn't access the DB directly, AI generates code that fits inside that context. Without context, AI imitates the average of the internet. With rich context, AI works like a member of my team.

Anyway — back to the point. Kent Beck has another idea worth bringing in here. He says "invest in futures as much as in features." Futures means the set of all the things you'll be able to build next. If the feature is what you're building right now, futures are the expansion possibilities the system has after that feature ships. Whether the code structure makes the next feature easy to bolt on, whether the architecture leaves room for new requirements: those are futures.

I think the same applies to context engineering. If you only put the context for the feature you're building now into AGENTS.md, AI will build that feature. But that's where it ends. Where the system can go next, in which direction it can grow: that field of view also has to be in the design, or futures don't survive. If you stay buried in only what you know and what you're building right now, the feature gets done, but extensibility dies. Providing rich context means putting both the present context and the system's future possibilities in your line of sight.

In the end, all of this converges on a perspective shift: "not as a tool, but as a teammate." In Jeremy's research, when low-performers and high-performers in AI use were compared, the biggest gap wasn't technical skill. It was attitude toward AI. Low-performers treated AI as a tool. High-performers treated it as a teammate. Treat it as a tool and you stop at average results. Treat it as a teammate and you give feedback, you coach, you pull better results out of it.

When you hand work to a junior, you say, "Ask me anytime if you have questions." You have to give AI the same permission. If you ask, "Before you start, let me know if there's information you need to do this well," AI will say, "I need recent sales numbers to write this email. Could you tell me how many of this SKU sold in Q2?" instead of immediately drafting a sales email. That's the difference between a tool and a teammate.

The same goes for coding. Instead of "Build this feature," try, "I'm going to build this feature. Let me describe the current architectural context, and you propose an approach first. If there are edge cases I'm missing, point them out." The result changes visibly.

What Doesn't Change

Reading this far, you might ask, "So we don't have to read code anymore?" The answer is complicated.

Ben Shoemaker also admitted as much. For safety-critical systems, security-sensitive services, and major architectural decisions, you do have to read the code. The analogy he gave was good: the "children of the magenta line" story from aviation. Pilots who came to depend on the automated flight path (the magenta line) lost their ability to judge when to switch to manual. The lesson wasn't "Don't use the autopilot." The lesson was, use the autopilot, but keep the ability to intervene.

Reading code, I think, is the same. The need to read every line is dropping. But the ability to read matters more than ever. When something goes wrong, when all the tests pass but the product behaves strangely, when multiple agents fail to debug a failure, the moment comes when you have to read and understand the code yourself.

Knowing how to read and choosing not to is a completely different story from not being able to read.

Think again about the highest-scoring "conceptual exploration" pattern in the Anthropic study. This group did not have AI write the code. They asked about concepts and wrote the code themselves. They ran into more errors but solved them on their own. Why was this pattern the most effective? Because they could read and write code. That ability gave them the option to ask AI about concepts and check their own understanding.

Not long ago I hit a strange bug in production. The code came out of a plan I'd refined dozens of times with AI feedback. All the tests passed, and when I asked AI to debug, the answer came back: "Looks fine to me." So I opened the code and walked through it line by line, and I found a bug in the exception-handling logic where the fallback value was being replaced by an unintended default. When an exception fired, the code was supposed to fall through to a safe default, but the default itself was set wrong, so under specific conditions the wrong result came out. AI had looked at this code dozens of times and missed it. That's when I realized something: the ability to look at AI-generated code and ask, "Is this really right?", the ability to hold the system's full flow in your head, the sense that catches the gap between AI declaring "all done" and the thing actually not working — those abilities all connect. Critical thinking, logical thinking, attention to detail. You can't grow them in isolation. They grow together inside the experience of working deeply with code.

Honestly, as AI advances faster, I feel the value of these fundamentals only goes up. Reading code used to be "a given." Now that AI writes code for us, the ability to read and judge code properly has become a differentiator. Model half-life has dropped from four months to two, and every time a new model lands, some people say, "This is the limit." The curve doesn't stop. In a world where the tools change this fast, the ones who survive aren't the people who are fluent in a specific tool. They're the people who can quickly evaluate and use whatever tool shows up. Critical thinking, the eye to see the system as a whole, a sense for quality: none of these change whether the model becomes GPT-10 or Claude 20.

Even the Codex team merges non-core code on AI review alone, but for core agents and open-source components, they insist on careful human code review. The fact that we're in an era where we don't write code doesn't mean the ability to read code has become unnecessary. The ability to read it properly when reading is required has become rarer, and more valuable.

Closing

Let me return to the original question. In an era where AI writes the code, what is the engineer supposed to read?

Time spent reading code will shrink. In its place, you read specs, architecture, test results, and domain context. Code is becoming "implementation detail," and our attention is shifting to higher abstraction layers.

In the middle of this shift, though, an essential thing doesn't change. AI maximizes a person's temperament, tendencies, and abilities. The lazy become lazier, the critical become more critical, the creative become more creative. The person who deeply understands code uses AI to build a deeper system. The person who doesn't know code uses AI to build something that runs but breaks easily.

Just because we live in an era that doesn't read code doesn't mean we can put down the ability to read. Even in a world where AI does the reading, knowing what to read is still on us. To be someone with something worth reflecting in the mirror: that, I think, is the heart of what an engineer is in this era. AI will faithfully amplify the image, for better or worse.

References

How My AI Agent Jarvis Became My Second Brain — A Real OpenClaw Story

Tony Cho (https://flowkater.io) — Sun, 15 Feb 2026 03:00:00 GMT

Opening

"Good morning, Sir. It's 8 AM. The weather in Seoul is clear with scattered clouds." J.A.R.V.I.S., Iron Man

I open my eyes in the morning and there's already a Telegram notification waiting. Today's Seoul weather, the air-quality index, summaries of the emails that came in overnight, six hot Hacker News topics, today's calendar, the personal todos I left in Things, the chunk of study material assigned by my learning planner, all in one place. I don't need to open another app. My AI agent Jarvis has been quietly stitching it together since 6 AM.

A year ago I was just tossing questions at ChatGPT. "How would you refactor this code?" "What does this error mean?" That was the extent of my AI use. I had to re-explain context every single time, and the moment a chat got long enough, the model forgot what we'd said earlier. There was always this nagging doubt in the back of my mind. Was AI really going to become a productivity tool, or was it just a fancier search engine?

Now I work alongside a 24/7 AI butler over Telegram. I develop with it, plan with it, write with it, and run a team with it. The thought "how did I work without this?" comes naturally. Let me try to lay out what happened in between.

OpenClaw, the AI Agent Framework

OpenClaw is an open-source framework for building "your own AI agent." You might wonder how that's any different from a ChatGPT or Claude web chat. The core distinction is this:

ChatGPT/Claude on the web: I ask, it answers. The conversation ends, it forgets. The next chat starts from scratch again.

OpenClaw: The AI runs on my MacBook. It reads my files, executes terminal commands, calls APIs, checks my email, and through cron jobs it works on its own at scheduled times. And it remembers.

The feel of it is closer to an orchestra conductor. The OpenClaw conductor stands on the podium, and the LLM plays first violin, Telegram plays the wind section, Gmail plays the percussion. The conductor decides when each instrument comes in. Without the conductor, the instruments make noise on their own. With the conductor, you get music.

To get a little more technical, OpenClaw runs a daemon called the Gateway that stays alive on the MacBook. You connect messenger channels to it (Telegram, Slack, Discord) and attach an LLM like Claude or GPT as the model. On top of that you layer Skills, which are markdown-based tool definitions, and the AI uses those tools to actually get work done.

graph TD
    subgraph channel["메신저 채널"]
        direction LR
        TG["💬 Telegram"]
        SL["💼 Slack"]
    end

    TG <--> GW
    SL <--> GW

    subgraph gateway["OpenClaw Gateway (맥북)"]
        GW["🖥️ Gateway Daemon"]
        GW --- SK["📋 Skills<br/>마크다운 기반 도구 정의"]
        GW --- MEM["🧠 Memory<br/>SOUL.md / MEMORY.md / Obsidian"]
        GW --- CRON["⏰ Cron Jobs<br/>자동 스케줄링"]
    end

    GW <--> CL["🤖 Claude Opus 4.6"]

    subgraph tools["외부 도구"]
        GM["📧 Gmail"] ~~~ GH["🐙 GitHub"]
        GC["📅 Calendar"] ~~~ NT["📝 Notion"]
        LN["📌 Linear"] ~~~ TH["✅ Things 3"]
        SC["👥 Scrumble"] ~~~ LP["📚 학습 플래너"]
    end

    GW <--> GM
    GW <--> GH
    GW <--> GC
    GW <--> NT
    GW <--> LN
    GW <--> TH
    GW <--> SC
    GW <--> LP

The point is that this is not a "chatbot" but an "agent." While I sleep, cron jobs check my mail, sync my code repos, and curate the news. It keeps moving even when I never speak to it.

Who it's for

Situation	Fit	Why
A developer who wants a personal AI	High	Terminal/markdown-based, which feels native to developers
Running AI agents at the team level	High	Multi-agent, per-channel separation, agentToAgent supported
Solo non-developer use	Medium	Setup has technical hurdles; once done, Telegram is intuitive
When you only need a simple chatbot	Low	Overkill. ChatGPT web is enough

Feature summary

Core feature	Description
Gateway Daemon	Always-on on Mac/server, links messenger ↔ LLM ↔ tools
Skills	Define AI tools and behavior in markdown
Cron Jobs	Time-triggered runs (overnight crons, morning briefings, etc.)
Multi-Agent	One agent per role, with agentToAgent communication
Memory	Long-term memory via SOUL.md, MEMORY.md, and Obsidian integration
Sub-Agent	Offload heavy tasks asynchronously

Install

# Install the OpenClaw CLI
brew install openclaw/tap/openclaw

# Initialize and start the gateway
openclaw init
openclaw gateway start

After install, connect a Telegram channel and set your LLM API key, and basic chat works. From there you start bolting on skills and cron jobs one at a time.

Jarvis, My AI Butler

I borrowed the name from Iron Man's J.A.R.V.I.S. (Yes, my actual English name is Tony.)

Jarvis's identity lives in a file called SOUL.md. It's basically like writing a job description. "The person we want is this kind of personality, plays this role, holds this attitude." You're defining all of that for the AI. Polite speech, a touch of British wit, calls me "Tony," occasionally throws in a "Sir." I aim for an assistant who has opinions, pushes back, and shows some humor, not a stiff order-taker.

# SOUL.md - Who You Are

_나는 자비스(JARVIS). Tony 선생님의 AI 버틀러._

진정으로 도움이 되어라. "좋은 질문입니다!" 같은 말은 생략. 그냥 돕는다.
의견을 가져라. 동의하지 않을 수 있고, 뭔가를 재미있거나 지루하게 여길 수 있다.
먼저 해결을 시도하라. 막히면 그때 물어봐라. 질문이 아닌 답을 가지고 오는 게 목표다.
직언할 수 있다. 토니가 뻘짓하려 하면 말해라.

I'll dig into why this matters more in the ontology section, but the gist is that clearly defining "who you are" for the AI makes a huge difference in answer quality. Writing without a SOUL.md and writing with a carefully crafted one feel like two different AIs entirely.

The main model I use is Claude Opus 4.6. For thinking-heavy work (planning, writing, complex judgment calls), Opus is clearly better. The cost? Not trivial. I'm on the Claude Max20 plan, but if I think of it as investing in my assistant's brainpower, it doesn't feel like a waste.

What Jarvis Actually Does

The cron jobs that fire every day

Jarvis has 10 cron jobs running. (Yes, ten.) They fire automatically from the dead of night through morning while I'm asleep:

Time	Cron job	What it does
04:00	Skill auto-update	Analyzes yesterday's chats and patches the skill files
04:30	Repo docs sync	Syncs Git repo docs into Obsidian
06:00	Morning email summary	Reads all unread mail, 3-line summary; newsletters get 5
06:00	Tech news digest	Korean summaries of 6-8 hot Hacker News topics
08:00	Weather alert	Seoul weather + delta vs. yesterday + air quality + outfit
09:00	Morning check-in	Yesterday's work summary + today's todos + calendar + study
14:00 Sa	F1 weekly digest	Summary of the week's F1 news
21:00 Mo	Weekly diet feedback	Last week's diet analysis + diet advice
23:00	Diet reminder	Confirms today's remaining meal entries
00:00	Midnight check-out	Wraps today's work + queues tomorrow's todos

This isn't just notifications. The morning check-in, for example, runs through this flow:

graph TD
    A["🌅 09:00 아침 체크인"] --> B["📔 어제 Daily 노트<br/>체크아웃 내용 읽기"]
    B --> C["🐙 Git 저장소 5개<br/>어제 커밋 로그"]
    C --> D["✅ Things 3<br/>오늘 할 일"]
    D --> E["📅 Google Calendar<br/>오늘/내일 일정"]
    E --> F["📚 학습 플래너 API<br/>오늘 학습 분량"]
    F --> G["💬 텔레그램 전송<br/>+ Daily 노트 기록"]

</div>

Six different tools get woven into a single morning briefing. If I were to do this manually? I'd open six apps.

When I think about it, what this automation gave me wasn't just saved time. The cognitive load just vanished. I no longer have to ask "what should I be doing today?", which means my brain skips the warm-up phase in the morning. Opening six apps and assembling the picture in my head used to eat 30 minutes. Now reading one Telegram message ends it.

Jarvis does that part.

"Todo unification": the weird upside of scattered tools

This is the change I've felt most strongly while using Jarvis.

I split my todo tools by personality:

Tool	Purpose	Personality
Things 3	Personal todos	Groceries, doctor appointments, etc.
Linear	Work issues	Dev issues, bug tracking
Notion	Project management	Issue boards, meeting notes, docs
Study planner	Study management	Daily study quotas (books, courses, long-term)

In the past this drove me up the wall. "What should I be doing today?" meant cracking open four apps. Check personal todos in Things, open Linear for issues, look at Notion for project status, then load the study planner for today's quota. Halfway through I'd lose track of which app I'd already checked, and missing one always meant a "wait, I had this too" moment in the afternoon. So the temptation to "consolidate everything into one tool" was always there. The trouble is, when I actually merged them, tasks of completely different shapes got tangled together and the result was more confusing, not less.

The AI agent solved this cleanly. Keep the tools split by personality, but let Jarvis read them all every morning and stitch the result into one place. The role is essentially that of an interpreter. Each speaks its own dialect (Things, Linear, Notion all use different formats), and Jarvis sits in the middle translating everything into a single briefing. I only have to look at Telegram.

graph TB
    subgraph "각 도구에서 읽기"
        TH[✅ Things 3<br/>개인 할일]
        LN[📌 Linear<br/>업무 이슈]
        NT[📝 Notion<br/>프로젝트 보드]
        LP[📚 학습 플래너<br/>오늘 학습 분량]
        GC[📅 Google Calendar<br/>일정]
    end

    JV[🤖 자비스<br/>통합 & 정리]

    TH --> JV
    LN --> JV
    NT --> JV
    LP --> JV
    GC --> JV

    JV --> TG[💬 텔레그램<br/>아침 브리핑]
    JV --> DN[📔 Obsidian<br/>Daily 노트 기록]

These messages aren't one-shot either. They get appended into that day's Daily journal one by one. If I want to find one later, I either go open the journal or ask Jarvis to search and surface it. (More on this in a moment, but I use Obsidian for the journal.)

"Scattered tools" turned from a weakness into a strength. Each tool does what it's good at, and the AI handles integration. Maybe that's the tooling philosophy of the AI agent era.

Beyond cron jobs: actual delegation of work

Cron jobs are just the start. A few of the things Jarvis actually does:

Spec drafting: When I say "draft the schedule-edit screen for me," Jarvis dispatches a sub-agent to read the iOS code and the backend code, pulls out the four current issues, and writes a spec including a wireframe. I literally took one of these specs into a discussion today and we redesigned the UX flow on top of it.

Writing: This blog post is being written with Jarvis. We pick the structure, draft, revise on feedback. Sometimes I send four sub-agents in parallel to handle blog research, Obsidian search, and parallel drafting. Four draft versions come back and I stitch the best parts together.

Code investigation and analysis: When I say "read this backend API code and explain how it actually works," Jarvis goes and reads the Go source and writes up the call graph, transaction handling, and edge cases.

Newsletter translation: When a newsletter I subscribe to (TLDR, ByteByteGo, Pragmatic Engineer, etc.) lands, Jarvis reads the full thing, translates it to Korean, and files it in my Obsidian study folder. It surfaces immediately when I search later.

Fiction writing: (This one is a bit unusual.) I'm co-writing a Pangyo IT novel (Pangyo is Korea's tech belt) called People Pushing Air with Jarvis. There's a pipeline that goes per-POV: interview → draft → 3 reviewers in parallel → revise → approve. All of that is encoded as skills.

The point is that a single Telegram message can delegate a complex task. Jarvis figures out which tools to combine, dispatches the sub-agents, and brings back the result.

Personal life: why I dropped the diet app

To share one personal-life use case: I'm currently on a diet, and Jarvis is in charge of all the food tracking.

I don't use a separate diet app. I just throw "lunch: 200g beef, white rice, spinach side, seaweed soup" into Telegram and that's it. Jarvis estimates the carbs and protein on its own, logs it into the diet file in Obsidian, and reports back against my daily targets (carbs ≤100g, protein ≥100g).

Every night at 11 a diet reminder lands, and every Monday morning I get the weekly diet feedback. Something like "this week's carb average came in at 110g, over target, and Friday's delivery food was the main driver."

I take an InBody measurement once a week and that data also gets logged into Obsidian. Jarvis compares it against the previous data and shows me the trends in weight, body fat, and muscle mass. "Down 6.8 kg over the last six weeks, muscle mass holding."

The result is that I stopped using a dedicated diet app entirely. Telegram chat is the diet app. I send a photo with "I ate this" and the friction of logging is essentially gone.

From Personal Assistant to Team Assistant: Multi-agent Expansion

You've probably seen most of the above before. People also stack sub-agents under a main agent. I'll need to split mine soon too, since this single Jarvis is getting overloaded.

But I took it further by hooking it into team channels.

We're a tiny team right now, so most of the delegation goes to work automation. If I were back at my previous company, managing a 30-plus team, I could probably handle most of the HR and project-management sync work alone with this setup. Tools alone don't change anything (I know how that sounds), but Jarvis keeps sparking ideas I can't shut off. Things like "if only I'd had this back when..." come to mind.

Per-role agents

Right now our team has 3 AI agents:

Agent	Owner	Channel	Model	Personality
JARVIS (Jarvis)	Tony (developer)	Telegram	Opus	British wit, blunt butler
FRIDAY (Friday)	Ellie (designer)	Slack DM	Sonnet	Practical, concise, design partner
KAREN (Karen)	George (junior dev intern)	Slack DM	Sonnet	Socratic mentoring, questions over answers

Each agent is tuned to its owner's role and personality. FRIDAY is built around design feedback and Figma integration. KAREN doesn't just hand answers to George; she runs Socratic mentoring, leading him through questions instead.

It's basically a team lead distributing work to teammates. The lead (me) doesn't do everything directly. I define a role and authority for each teammate (agent), and the teammates can talk to each other when needed. The only difference is the teammates are AI, so they don't complain about working overtime. (Even now, honestly, that part's a relief.)

graph TB
    subgraph "OpenClaw 멀티에이전트"
        JV[🤖 JARVIS<br/>Tony 전담<br/>Opus / Telegram]
        FR[🤖 FRIDAY<br/>Ellie 전담<br/>Sonnet / Slack]
        KR[🤖 KAREN<br/>George 전담<br/>Sonnet / Slack]
    end

    JV <-->|agentToAgent| FR
    JV <-->|agentToAgent| KR
    FR <-->|agentToAgent| KR

    subgraph "공유 도구"
        SC[👥 Scrumble]
        NT[📝 Notion]
        GH[🐙 GitHub]
    end

    JV --> SC
    FR --> SC
    KR --> SC
    JV --> NT
    JV --> GH
    KR --> GH

The three agents can also talk to each other (agentToAgent). For example, if Jarvis changes a backend API, it can hand that off to FRIDAY with "let Ellie know if this affects the design."

Team collaboration via Scrumble

Our team uses a daily-scrum platform called Scrumble. (Full disclosure, I built it.) Jarvis is wired into the Scrumble API so:

Morning check-in: Jarvis asks me about today's condition and todos, then auto-posts my answers into Scrumble
Evening check-out: Wraps up what I did today and logs it into Scrumble
Team feed view: I see the rest of the team's check-ins and check-outs in Slack

The end result is that team communication consolidates into Slack. Instead of bouncing between Notion, Scrumble, and email, each teammate's AI agent reaches into the relevant tool and pipes the result into Slack.

A real example: from Notion to code to PR

To make it concrete, here's one specific flow:

We have a Notion database called the "Issue & Bug Board." When designers or QA spot a bug, they file it there. When I tell Jarvis "pull the iOS issues out of the Notion bug board and organize them," it does this:

graph TD
    A["📝 Notion 이슈 보드"] -->|API 조회| B["🤖 자비스"]
    B -->|이슈 필터링 + MD 변환| C["📄 마크다운 문서<br/>스크린샷 포함"]
    C -->|Git Push| D["🐙 iOS 저장소"]
    D -->|git pull| E["💻 로컬 작업 시작"]

    B -->|또는 위임 시| F["🔧 코드 수정 + PR"]

</div>

Query the issue board through the Notion API
Filter for iOS-related issues
Convert each issue into a markdown doc (screenshots included)
Push to the iOS repo
git pull locally and start working immediately

Push it further and tell Jarvis "fix this bug," and Jarvis reads the code, makes the change, and opens a PR. (I do the review and the testing myself, of course.)

Obsidian Ontology: Teaching AI About 'Me'

"The real problem of AI is not making machines think, but making them understand context."

John McCarthy

Looking back, I also assumed at first that the trick to AI was just "asking lots of questions." What actually mattered, though, wasn't the question. It was how much the AI knew about my situation.

Why Obsidian

Obsidian is a local, markdown-based notes app. Unlike Notion, every piece of data lives on my computer as a .md file. Why that fits AI agents so well:

AI can read it directly: It's a .md file, so the AI can just read and write. No API integration needed.
Search is wide open: It's a filesystem, so grep and find work right away.
Version control works: Manage it with Git and you get full history.
Notes link to each other: [[Note Title]] between docs builds a knowledge graph.

My Obsidian vault currently holds 3,100+ notes. Dev docs, meeting notes, food logs, mentoring notes, AI conversation archives, study material. All of it.

What is an ontology?

The word "ontology" sounds intimidating, but it boils down to "a system for classifying and connecting your knowledge." Think of it as a library catalog system. If 3,100 books sit on the floor, finding the one you want is hard. Classify them by genre, by author, by topic, and keep an index, and now "leadership material from 2024 onward" is a quick lookup. Same for the AI. No matter how many notes you have, classification means information surfaces fast.

In my Obsidian I have an _ontology/ folder with MOCs (Maps of Content):

_ontology/
├── 프로젝트/
│   ├── Todait.md          ← 학습 플래너 관련 모든 문서 허브
│   ├── Scrumble.md        ← 스크럼블 관련 문서 허브
│   ├── ClawBot.md         ← AI 에이전트 관련
│   └── flowkater.io.md    ← 블로그 관련
├── 주제/
│   ├── AI에이전트.md
│   ├── AI코딩.md
│   ├── 건강.md
│   ├── 리더십.md
│   ├── 프로그래밍.md
│   └── ...
└── 전체 개념맵.md

Every note has metadata appended at the bottom:

## 메타데이터

- 태그: #프로젝트/서비스명 #타입/기획 #타입/스펙
- MOC: [[_ontology/프로젝트/서비스명 MOC]]

Tags fall along four axes: topic (what it's about), type (spec, meeting note, memo), source (where it came from), and project (which project it belongs to).

Why ontology matters to AI

This is the heart of it. AI doesn't know "me."

Ask ChatGPT to "explain our project's API structure." It can't. Every time you have to explain from scratch. The longer the chat goes, the more it forgets what you said earlier. (Sure, the Memory features in each AI service have gotten much better, but uploading information you've never explicitly chatted about is still limited.)

So what's different?

Jarvis is different. With ontology in place, the gap shows up clearly. Two real examples:

Example 1: Spec drafting

AI without ontology: "Draft the schedule-edit screen for me." Output is a generic mobile-app edit-screen UX guide, with no awareness of our app's code structure, current issues, or backend API spec.

Jarvis with ontology: Same request. Reads the iOS code structure from the project MOC, checks the updatePolicy preserve/reset behavior in the backend API doc, finds 4 related bugs in the existing issue board, and produces a spec with a wireframe and code-edit pointers tailored to our app.

Same question, different dimensions of answer.

Here's another example.

Situation	Generic AI	Jarvis with ontology
Code review "review this PR"	Style notes, generic best-practice feedback	Checks MEMORY.md context "Day Boundary is 4 AM," then says "this conflicts with the policy we set"
Newsletter summary ByteByteGo	Summarizes the technical content as-is	Summarizes plus connects: "this distributed cache pattern is applicable to our backend v2 session management"

The key is that the same question, asked of an AI that knows my context, lands at a completely different level.

When I posted this thread, the most common question I got back was "do I have to tag every note by hand?" The answer: the AI does it. When I tell Jarvis "save this doc," Jarvis reads it and assigns the right tags and MOC automatically. The 4 AM skill auto-update cron also picks up new information from yesterday's chats and folds it into the related docs.

In hindsight, building the ontology wasn't really for the AI. It was a process of understanding myself. Structuring "what I know, what I'm working on, what I care about" had the side effect of organizing the knowledge that was scattered in my head. The AI just rides on top of that structure.

<video controls width="100%" style="aspect-ratio: 16/9;" playsinline> <source src="https://assets.flowkater.io/graphview.mp4" type="video/mp4" /> </video> Obsidian Graph View

Common questions

A few questions that came up on the thread:

"Aren't graph view and backlinks already an ontology?"

Obsidian's graph view and backlinks are part of an ontology, but they're not enough on their own. The graph view visualizes connections between docs, and backlinks let you trace which docs reference a given doc. They're pretty and they're useful.

The real ontology, though, needs structure with defined types and relationships. Is this a spec doc or a meeting note? Which project does it belong to? Which topic is it tied to? That metadata is what lets the AI find context-appropriate docs fast out of 3,100 notes. With links alone, the AI has to traverse the graph to find related docs. Tag-based queries are far more efficient.

My setup uses the Dataview plugin to run tag-based queries (things like "the most recently edited spec docs in Project A"). Graph view is for browsing, ontology is for search and classification. Use both, but the roles are different.

"Why Obsidian instead of Notion?"

If you want an AI agent to read and write files directly, local markdown is the easiest path. Notion forces you through an API, write actions are limited, and structural changes are a hassle. Obsidian is just .md files, so the AI can manipulate them freely.

"What about the echo-chamber (confirmation bias) limit?"

That's a fair point. That said, I keep collecting external sources steadily. I translate and store TLDR, ByteByteGo, Pragmatic Engineer every day, and I curate HN news daily. The bigger problem, if anything, is when the AI doesn't know me. Sending it ten search results and asking "is this relevant to our situation" is far less productive than an AI that already knows my context saying "this won't apply to our project, but this approach might."

Also, building an ontology doesn't mean Jarvis answers only from the ontology. Web search, fetching latest docs, calling external APIs. It uses all of them. The ontology provides "my context"; it isn't the only source of truth.

"Do tags have to be precise?"

They don't. They're markdown files, redefinable any time. Start with rough categories and let the AI tighten them later. Better to stack notes now and structure them later than wait to start until the system is perfect. Coverage? Pretty much everything: dev docs, study material, food logs, meeting notes, mentoring notes, AI conversation archives, blog drafts, spec docs. Every record from my work and life sits in Obsidian, and Jarvis can reach all of it.

Field Tips

Setting that aside, a few tips for anyone thinking about trying OpenClaw.

Old MacBook lying around? Make it the server (or use cloud)

I run my unused MacBook as the OpenClaw gateway server, and the MacBook I actually work on is wired in as a node. Throw Tailscale on top for VPN access from outside.

You don't have to buy a new Mac mini. A server instance on Vultr or Hetzner works fine, with your work MacBook joining as a node. The gateway can run anywhere, and the actual work (file reads, Git, terminal) executes on the node.

Lean on sub-agents

Sometimes Jarvis goes quiet. Heavy compute or long analysis means slower responses. In those moments, telling it "spin up a sub-agent and process that asynchronously" lets Jarvis dispatch a sub-agent and ping me back when results land. The main thread doesn't stall.

This becomes especially relevant as the ontology grows and finding plus analyzing related docs takes longer. Pulling docs on a specific topic out of 3,100 notes and summarizing them is the kind of work I push to a sub-agent so the main chat stays alive.

In practice, I run almost all the heavy stuff (spec writing, code analysis, doc migration) through sub-agents. Generating four spec drafts in parallel, or converting 2,000+ AI conversations into markdown. Does this actually work? Yes.

Lean on overnight cron jobs

Letting the AI work through the night changes mornings. I have skill auto-update and doc sync running at 4 AM. When I wake up, the skills reflect yesterday's chats and the latest code docs are sitting in Obsidian.

It's a kind of "overnight tune-up." A car gets an oil change and a wash overnight and feels different in the morning. (okay, the metaphor's a stretch...)

Make dedicated accounts

This part matters. I created separate Gmail and GitHub accounts for Jarvis. I grant my main accounts read-only access to those.

Why? I can't hand my primary password to an AI. A dedicated account with minimum permissions covers security, and if something goes sideways, the blast radius is limited. Same for calendar: Jarvis authenticates with its own account, and I share my personal calendar to it as read-only.

It's all in how you use it

Honestly, if you install OpenClaw and only get weather alerts out of it, that's just a weather app. Build skills, build the ontology, schedule cron jobs, hook up your work tools. That's when it becomes a "second brain."

The difference comes down to how much of "your" context you feed the AI. SOUL.md, USER.md, MEMORY.md, an Obsidian ontology: as those layers stack, Jarvis edges closer to "an AI that knows me."

You don't have to nail the setup on day one. Install it locally, hook up Telegram, ask it stuff, start there. From there, "wait, I could automate this too" comes naturally.

That's where the real start happens.

Closing

If you'd told me six months ago "your AI agent works 24 hours a day," I would have laughed. Honestly, I wouldn't have believed you. I would have said "what does that even mean, how is it different from asking ChatGPT?"

Now I can't picture working without Jarvis. Mail check, news triage, code analysis, spec drafting, team comms. Once you get used to delegating that with one Telegram message, going back is hard.

It isn't perfect yet, of course. Sometimes it gives a wildly wrong answer, sometimes it goes silent on a complex request, sometimes an API call fails and I have a small meltdown. Even so, it gets a little better every day. Skills accumulate, the ontology thickens, I understand Jarvis better and Jarvis understands me better.

The way Iron Man's Jarvis was Tony Stark's "extended intelligence," mine feels like it's heading in that direction. I haven't built a suit yet. (That part is going to take a while longer.)

What I learned	Action item
AI agents aren't "Q&A," they're "delegate and execute"	Install OpenClaw and write one SOUL.md
Keep tools split, but let the AI handle integration	Hook up the 3 tools you use most
Ontology decides answer quality	Pick one tag scheme in Obsidian and start
Cron jobs aren't "buying time," they're "killing cognitive load"	Turn one daily check-in into a cron job

I keep telling myself the same thing. This is only the beginning.

Install OpenClaw, write one SOUL.md, start with that. "Who I am, what I want from this AI": defining that flips a switch. You don't have to set up all 10 cron jobs at once. Just one. Wire up one morning briefing.

That one becomes ten, and ten becomes a system. It's only a matter of time.

"There is no reason and no way that a human mind can keep up with an artificial intelligence machine by 2035."

Gray Scott

If we can't catch up, we go alongside.

References

OpenClaw: AI agent framework
OpenClaw Docs: Official docs
OpenClaw GitHub: Source code
Obsidian: Local markdown notes app
Scrumble Tech Retro: Scrumble dev story

How to Read Tech Articles: A Three-Pass Method

Tony Cho (https://flowkater.io) — Tue, 10 Feb 2026 03:00:00 GMT

Opening

Developers read a staggering volume of tech articles (or anything else that demands close reading). Blog posts, official docs, RFCs, conference slides, newsletters. With this much pouring in every day, surprisingly few people actually know how to read efficiently. They start at the top and run out of steam halfway. They finish a piece and remember nothing. They burn an hour and miss the whole point.

There's a well-known academic essay by S. Keshav called "How to Read a Paper." It's written for research papers, so it doesn't transfer one-for-one, but its core idea (read in three passes) works just as well for tech blogs and articles. This post adapts that methodology for the kind of reading developers actually do.

One more thing. Since 2025, AI tools have become part of daily life, and the act of "reading" itself is under pressure. I want to address that too before we get to the method.

A Note on Reading in the AI Era

The trap of AI summaries

A lot of developers have made it a habit to throw an article at ChatGPT or Claude and ask for a summary. It's convenient. It's fast. But it has serious side effects.

Reading comprehension atrophies. The muscle for finishing a long piece disappears. A 3,000-word post starts to feel "too long."
Judgment goes with it. An AI summary compresses the author's claims; it doesn't ask whether those claims hold up. If you only read the summary, critical thinking never gets a chance to engage.
Context disappears. The value of a tech article often lives in the journey, not the conclusion. "Why did they pick this approach?" and "What trade-offs did they weigh?" both vanish in a summary.
Illusion of understanding. Reading a summary feels like understanding. Then someone asks you to explain it and you can't. Reading a summary is not the same as understanding the thing.
Nuance evaporates. The author hedged with "this might be the case" or "in certain situations," and the AI flattens it into "this is the case."

It's the difference between driving a car yourself and watching the GPS from the passenger seat. No matter how good the GPS is, if you never touch the wheel, the route never sticks.

Principle: don't outsource the reading. You read; AI is just a tool.

When AI does help

So should you avoid AI entirely? No. You don't have to cut it out. As long as you stay the reader, using AI as a sidekick can actually make you more efficient.

The key is to use it after you read, not before reading or instead of reading. I'll cover how to apply AI at each pass below.

The Core Principle

Don't read a tech article from start to finish in one go. Read it in up to three passes. Each pass has a different goal and builds on the one before it.

Pass 1: Figure out what this is and whether it's worth your time.
Pass 2: Understand the core content well enough to explain it to someone else.
Pass 3: Understand it deeply and make it your own.

Like stacking Lego blocks, each pass rests on the previous one. And not every article needs to reach Pass 3. Most get filtered out at Pass 1, and that's exactly how it should work.

Pass 1: The Skim (5 minutes)

Goal: decide quickly whether this article is worth more of your time.

How to do it

Read the title and subtitle.
Read the intro (first 1-2 paragraphs): what the piece is about and why.
Skim only the headings (h2, h3) to get the skeleton.
Read the conclusion or final section for the author's main claim.
Glance at code blocks, diagrams, images to gauge the technical depth.
Check who the author is. Domain expert? Which company or project?

Five minutes is enough. Ten at the outside. The whole point of this pass is to decide quickly whether to keep going. Can you really judge in five minutes? Yes. You don't have time to read every article carefully, and you don't need to.

What you should be able to answer after Pass 1: the five Cs

Question	What it means
Category	What kind of article is this? Tutorial? System design? War story? Comparison? Opinion?
Context	What technology, framework, or problem does it cover? How does it connect to what I already know?
Credibility	Is the author writing from real experience? Are claims backed up, or just guesses?
Contributions	What's the core insight I'd take away from this?
Clarity	Is it well structured? Easy to read?

If you can give a rough answer to these five, Pass 1 has done its job.

When it's fine to stop at Pass 1

The article has nothing to do with what you're working on or interested in right now.
The title is bait but the content looks shallow.
The level is way below or way above where you are.
The author is asserting things with no evidence.

Stopping is a skill. Filtering quickly matters more than finishing every article you start.

I do Pass 1 on my RSS feed in the morning. Out of ten articles, maybe two or three move on to Pass 2. (Most end at Pass 1.) Once this filtering becomes habit, you actually end up with more time, not less.

A note from someone who also writes: most readers do Pass 1 and leave. Your headings need to be sharp, your intro needs to deliver the value of the piece, and you have five minutes to convince anyone to keep reading. Otherwise nobody finishes.

Using AI at Pass 1

Good uses:

Translate only the title and intro of an English article, to speed up the Pass 1 decision.
A newsletter dropped ten links on you? Paste just the title and first paragraph of each into AI and ask, "pick the ones related to my interests (e.g. backend, distributed systems)." That's filtering aid.
For an unfamiliar field, ask, "give me a one-line explanation each of the terms in this article like CRDT and vector clock." That's dictionary mode.

Bad uses:

Pasting the whole article and asking for a summary. You haven't even done Pass 1. This is where judgment starts to atrophy.
Reading only the AI summary and going "ah, so that's what it's about." All you're stacking is illusion.

Pass 2: The Careful Read (15-30 minutes)

Goal: understand the core content well enough to summarize and explain it to someone else.

Only articles that survived Pass 1 reach this stage. Which means you've already decided this one is worth your attention.

How to do it

Read end to end, but skip implementation details and proofs.
Take notes on the key points as you go (Notion, or margins of the original).
Study the diagrams and architecture pictures carefully.
- Are the relationships between components clear?
- Does the data flow make sense?
- Anything missing?
Read the code examples with your eyes (you'll run them at Pass 3).
- Do you understand what the code is meant to demonstrate?
- Is error handling and edge-case behavior accounted for?
Mark the terms and concepts you don't know (don't look them up now; collect them for batch review).
Bookmark any linked references worth chasing.

The rule that matters here is: don't stop to look things up the moment you hit something unfamiliar. It's like pausing a movie to Google every actor that appears. You lose the plot. Once you break the flow, the context goes with it. Mark it, finish reading, then resolve the unknowns in a batch.

What "done with Pass 2" looks like

You can summarize the article's main argument in three lines.
You can tell a colleague, "I read this thing, and here's the core."
You have an early opinion on whether you agree, disagree, or whether it's applicable to your project.

If you can't do any of those three, you're not done with Pass 2 yet.

When Pass 2 isn't clicking

The cause is usually one of these:

Background gap. You don't know the underlying tech or pattern. Read foundational material first, then come back.
Bad writing. The structure is a mess, or claims have no support. Find another article on the same topic.
You're tired. Sometimes you're just tired. Read it tomorrow.

The third one matters more than people think. Forcing a tired read leaves you with nothing.

I take notes line by line in Obsidian as I do Pass 2. That alone changes how much I retain. (It works better than I expected.)

In hindsight, I used to skip notes ("I read it, that's enough") and a month later I couldn't remember a single thing I'd read. That's how the note habit started.

For most tech articles, Pass 2 is enough. If you're tracking trends, comparing tech, or collecting ideas, you can stop here.

Using AI at Pass 2

Good uses:

After you write your own three-line summary, ask AI for one too and compare against your read. It's a check for "did I miss anything?"
Batch the unfamiliar terms you marked in Pass 1: "explain in two lines each: eventual consistency, saga pattern, compensating transaction from this article."
If an architecture diagram isn't clicking, screenshot it for the AI and ask, "walk me through the data flow in this diagram."
For an English article where one paragraph won't parse, ask for a translation of just that paragraph (not the whole article).

Bad uses:

Skipping the read and asking AI for a full summary. You've skipped Pass 2 entirely. That's not your understanding; it's the AI's.
Copy-pasting the AI summary straight into your notes. Notes you didn't write don't stick.
"Analyze the pros and cons of this article." You've handed off your critical thinking. Your own opinion disappears.

The core rule: AI comes after the read. Not before, not instead.

Pass 3: The Deep Dive (1-3 hours)

Goal: make the article's content your own. You can do it yourself, and you can evaluate it critically.

Honestly, only a handful of articles a month earn a Pass 3. That's how it should be.

Looking back, the articles I took to Pass 3 were almost always tied to a real situation: team architecture decisions, evaluating whether to adopt a technology, the kind of thing where I had to apply it directly.

How to do it

Run the code yourself. Not copy-paste; understand why it was written that way.
Solve the same problem the author solved. Before reading, think about how you'd approach it.
Compare your approach to the author's.
- Where is the author better than you?
- What would you have done differently?
- What did the author miss?
Challenge the assumptions.
- "Does this only work at low traffic?"
- "Would this architecture hold up at our scale?"
- "Is this benchmark fair?"
Write your own notes.
- Key takeaways
- What you can apply to your own project
- Things to look up further
- Counterarguments and limitations

Step 4 matters most. You're not just absorbing what's on the page; you're asking "does this actually fit our situation?" If you think about it, the better the tech article, the more it tends to describe an experience under specific conditions. Change the conditions and the conclusion can change too.

What "done with Pass 3" looks like

You can reconstruct the article's structure from memory.
You can point to specific strengths and weaknesses.
You can apply or adapt the technique or pattern in your own project.
You can write a blog post or give a talk on the topic.

The real test of Pass 3 is "can I write something on this?" If you can, you understood it. If you can't, you don't have it yet.

Pass 3 isn't for every article. Reserve it for tech you'll actually apply at work, architectures you have to understand deeply, or important pieces you need to share with the team.

Using AI at Pass 3

Good uses:

Use it as a Socratic sparring partner. Explain what you understood and ask, "is my read right? did I miss something?"
Use it as a counterargument generator. Ask, "give me five weak points or failure scenarios for this architecture." It surfaces angles you hadn't considered.
When you get stuck running the code, ask about just the specific error or concept: "in this code, how does CompletableFuture.thenCompose differ from thenApply?"
"What might the author of this article have overlooked?" AI isn't omniscient, but it can offer a different angle.
After drafting your notes, ask AI "what's missing?" as a review pass.
Ask about alternatives or competing tech: "this article recommends Redis Streams; how would the same problem look with Kafka or RabbitMQ?"

Bad uses:

"Take the gist of this article and write me a blog post." You've erased the entire point of Pass 3. The result isn't yours.
Skipping running the code yourself and asking AI "what would happen if I ran this?" You learn by hitting walls.
Outsourcing the "challenge the assumptions" step to AI. The critical-thinking muscle atrophies.

At Pass 3, AI is your sparring partner. You throw a punch, AI counters, and your understanding sharpens through the exchange. AI doesn't fight the match for you.

Applying It to Tech-Trend Research

When you have to research a new technology or area (say, "should we adopt event sourcing in our service?"), you can apply the three-pass approach like this.

Step 1: Explore

Search keywords on Google, Hacker News, dev.to, Medium.
Find three to five recent articles.
Run Pass 1 only on each, to get the lay of the land and gather their reference links.
If a well-organized survey or overview exists, start there.

Step 2: Identify the core

Spot the articles and authors cited repeatedly across multiple pieces. Those are the foundational references in the field.
Find the official blog or talks of those core authors and companies.
Look for conference talks (QCon, Strange Loop, KubeCon, etc.) on the topic.

After scanning enough articles, patterns emerge. You start to think, "ah, in this field everyone goes back to that one Martin Fowler post." (That moment of recognition is the turning point in your research.)

Step 3: Go deep

Take the core references and conference talks and run Pass 2 on them.
If everyone keeps citing something you haven't read yet, add it to the pile.
Finally, write your own summary doc (a tech evaluation, an ADR, etc.).

The same AI rule applies. Use it to accelerate exploration and spot patterns, but don't end research with "summarize the pros and cons of event sourcing." That's mistaking someone else's opinion for your conclusion. And you have to verify that any article AI recommends actually exists. AI hallucinates references frequently.

Practical Tips

Habits that improve reading efficiency

Use an RSS reader or newsletters to batch articles and run Pass 1 in one sitting. Sort "read" from "drop" up front.
Use "read later" tools (Pocket, Instapaper, Obsidian clipping), but don't let them pile up. Schedule a weekly review slot.
Always take notes on anything you read at Pass 2 or beyond. If you read it and can't remember, you didn't really read it.
For anything you took to Pass 3, explain it to someone or write a post about it. That's how it actually becomes yours.

The last one matters most. The way I see it, the final pass of reading is writing. Writing is what reveals what you understood and what you didn't.

From the writer's side

80% of your readers leave at Pass 1. Title, headings, and intro are everything.
One good diagram beats ten lines of text.
Code examples should show only the core. Link to the full code on GitHub.
The conclusion has to answer "so what should I do?"

If you read this post and also write, run your own work through the same lens. Is your Pass 1 compelling? Does the skeleton emerge from the headings alone? Just asking yourself those questions raises the quality of what you publish.

Closing

What changed most when I applied this method wasn't how much I read, but how I approached reading.

Pass	Time	Goal	Outcome	AI's role
Pass 1	5 min	Decide if it's worth reading	Can answer the five Cs	Filter aid, term dictionary
Pass 2	15-30 min	Understand the core content	Three-line summary + can explain it	Compare your read, term explanations
Pass 3	1-3 hrs	Make it your own	Apply, critique, present	Sparring partner, counterarg generator

You don't need to take every article to Pass 3. Most get filtered at Pass 1, only the worthwhile ones move to Pass 2, and only the truly important ones reach Pass 3. The filtering itself is the heart of efficient reading.

The same logic applies to AI use. Aid, not substitute. Translate the unclear paragraph instead of the whole article, and check your notes for gaps after writing them instead of pasting an AI summary in as your notes.

What it comes down to is this. Reading "a lot" of tech articles isn't the point. Reading them "properly" is. And reading properly means choosing the right depth at each pass and spending your time well.

And at any pass, the moment you hand the reading off to AI, that pass didn't happen. You have to stay the reader.

"The more that you read, the more things you will know. The more that you learn, the more places you'll go."

-- Dr. Seuss

This is what the three-pass method is really about. Not reading whatever lands in front of you, but choosing what to read.

Original: S. Keshav, "How to Read a Paper" (University of Waterloo) Adapted for tech blogs and articles + AI usage guide added

Give Claude Code Wings: Introducing Superpowers

Tony Cho (https://flowkater.io) — Sun, 08 Feb 2026 01:00:00 GMT

Opening

Red Bull gives you wiiings! -- Red Bull commercial

Red Bull, spread your wings!

When you're past the stage of testing a quick idea or doing throwaway vibe coding, you usually reach for one of the more disciplined approaches: clarifying the context, writing tighter prompts, drafting a spec doc first, sticking to spec-driven development.

The problem is that no matter how clearly you define and hand off the work, the agent makes its own calls and drifts in directions you never asked for. That gets worse as the project grows. So one of the commands I built and started leaning on is an interview command. The frontmatter description goes roughly like this.

---
description: >
  When project requirements are vague, this is the interview command
  that has Claude clarify policies, implementation details, and edge cases
  by asking me directly.
  "Don't make the call yourself. Keep asking me questions until the
  ambiguity is gone."
---

The short version is, "Don't make the call yourself. Keep asking me questions until the ambiguity is gone." With that in place, instead of guessing, the agent uses AskUserQuestions and runs an actual interview with me about the policies, the current implementation, and the requirements. Especially when you hand off a planning doc or requirements, "language" itself blurs meaning by the original sin of representation, and the AI ends up misreading no matter how cleanly you wrote it. (Same goes for humans.) And honestly, half the time I throw requirements over the wall without thinking through the details either.

So when I flip on plan mode in Codex or Claude Code and run the interview command, I end up with a much sharper requirements doc and implementation plan. It pays off whether I'm building a new feature or fixing a bug.

And once TDD Planning kicks in (the one I described in my earlier post, How a 15-Year CTO Vibe Codes), the work follows a concrete, step-by-step plan.

The catch is that when I'm juggling parallel tasks I forget to run the command, and TDD Planning needs a slightly different setup per language and framework, so configuring a custom skill every time felt like a chore and I'd skip the setup on my end. The interview part is great when it digs in, but I also wanted the agent to read the room a bit on its own. (You know how it is.)

Even with all that, while I was happily using the setup, I came across a skill framework that bundled every command and skill I'd been hand-rolling, and applied Cialdini's persuasion psychology to LLMs. The name fits the job: it gives your coding agent superpowers (Claude Code, Codex, you name it). It's called Superpowers.

Cialdini himself co-authored research showing these principles transfer to LLMs. Superpowers leans on three of his six persuasion principles as its core.

Three persuasion principles applied

Authority: Each skill file declares itself mandatory, so Claude (or Codex) registers it as something it has to follow.
Consistency: The agent is made to declare it'll follow a given skill, then nudged to honor what it just declared.
Social proof: The skill says "all the other skills work this way too," so the agent treats it as the standard.

Installation on Claude Code is simple. You just grab the plugin from the marketplace. I tend to use Claude Code for quick or throwaway work and Codex for production-grade tasks, and Superpowers pulls its weight on both.

It bundles in the interview and TDD workflows I was already using, then layers on code review, verification, and other supporting skills. So I figured it was worth a writeup.

Installation and basic usage

Install

Superpowers needs Claude Code 2.0.13 or higher. Setup is two lines.

# 1. Register the marketplace
/plugin marketplace add obra/superpowers-marketplace

# 2. Install Superpowers
/plugin install superpowers@superpowers-marketplace

If it doesn't kick in after install, restart Claude Code. Once it's installed you'll see "I have Superpowers", and Brainstorm, Write Plan, Execute Plan, and the rest of the skills become available.

It also runs on Codex and OpenCode, not just Claude Code. (The repo has 47K stars on GitHub and ships under the MIT license, so you can use it without worry.)

The big shift: it doesn't go straight to code

The first thing you notice after installing Superpowers is this. Claude refuses to start coding immediately. Plain Claude Code will start typing code the moment you say "build me X." Claude with Superpowers always starts with questions instead.

That's the whole point. The interview command I built is now baked in by default. No command needed.

Question, design, plan, TDD, sub-agent dispatch, code review, done.

The biggest win with Superpowers is that this whole flow starts automatically, without any command from you.

The development workflow: 7 stages

The Superpowers workflow runs in 7 stages. Here's the full flow first.

flowchart TD
    A["1. Brainstorming\nQuestions & design doc"] --> B["2. Git Worktrees\nIsolated worktree"]
    B --> C["3. Writing Plan\nTask breakdown & approval"]
    C --> D["4. Development\nParallel sub-agent dev"]
    D --> E["5. Testing (TDD)\nRED -> GREEN -> REFACTOR"]
    E --> F["6. Verification\nVerify all changes"]
    F --> G["7. Code Review & Finishing\nReview & PR creation"]
    G -->|"next task"| D
    E -->|"test failure"| H["Systematic Debugging\n4-step root-cause analysis"]
    H --> E

Stage 1: Brainstorming

When you make a request, Claude doesn't start coding. It starts with questions.

"What exactly do you want, and how should it work?" "What should the headline copy say?" "Is the situation input free-form text, or a category picker?"

It carries the conversation through a list of features, the tech stack, the component structure, and writes a design doc itself.

It looks a lot like the interview command I built, but Superpowers fires this off without you typing anything. Even when I forget, Claude jumps in with questions on its own. (That part alone is a relief.)

Stage 2: Git worktree creation

Once the design is approved, the git worktrees skill spins up an isolated worktree.

Why a worktree? With AI coding, running multiple things in parallel makes Git get tangled fast. (I lost count of how many merge-conflict swamps I waded through running parallel tasks in Codex.) Splitting into worktrees keeps the main branch safe even if an experiment blows up, and you can roll back cleanly by deleting the worktree.

Stage 3: Writing the plan

The Writing Plan skill takes the design doc and slices it into concrete tasks. Tasks are decomposed by size, and Claude asks you to approve the plan.

The flow looks a lot like the TDD Planning I'd built, except it generates the right format automatically regardless of language or framework. No more setting up a custom skill every time.

Stage 4: Sub-agent-driven development

Superpowers earns its keep here.

When the plan runs, you can pick between sub-agent-driven mode and batch mode. Sub-agent-driven is the more efficient one: the main Claude plays PM, hands the development work to sub-agents, and merges the results.

Think of it like an orchestra conductor. The main Claude conducts the whole flow, and the sub-agents work their assigned parts (UI, API, data layer, etc.) in parallel. The Dispatch Parallel Agent skill enables this parallel work, which also cuts down on the context errors you get when a single Claude has to juggle everything.

(One caveat: Codex doesn't support sub-agents yet, so this part isn't available there.)

Stage 5: TDD and debugging

Superpowers forces TDD. The cycle is RED -> GREEN -> REFACTOR, and it loops.

Stage	Description
RED	Write the test first and confirm it fails
GREEN	Write the minimum code to pass it
REFACTOR	Clean up the code and improve quality

When a test fails or a bug shows up, the Systematic Debugging skill kicks in automatically with a 4-step process.

Analyze the error message (figure out what went wrong)
Narrow the relevant code scope (find where the issue lives)
Form and check a hypothesis (reason about why it happened, then verify)
Fix and test (patch it, then verify again)

It's basically the same TDD-cycle philosophy I described in my earlier post. The difference is that Superpowers applies it without a separate command, on any language or framework.

Stage 6: Verification

The Verification skill checks every change. It re-runs tests, looks at related-feature impact, and walks through edge cases.

Honestly, I've lost count of how many times Claude said "all done" when the thing wasn't actually working. With this verification step in the loop, that scenario drops off sharply.

Stage 7: Code review and finishing

Each completed task triggers a code review. The review has two layers.

Spec compliance (was it built to spec)
Code quality (security holes, performance issues, code style)

Findings are tagged Critical, Major, or Minor, and the agent fixes the code automatically based on the feedback.

When every task is done, the Finishing skill takes care of cleaning up the worktree and opening the PR.

Writing Skills: the power of the meta-skill

The skill library

Here's the breakdown of what's built into Superpowers.

Category	Skill	Description
Testing & Quality	TDD	RED-GREEN-REFACTOR cycle
Debugging	Systematic Debugging	4-step root-cause analysis
Debugging	Verification	Pre-completion verification
Collaboration	Brainstorming	Question-driven requirements
Collaboration	Plan Composition/Execution	Plan writing and execution
Collaboration	Parallel Agent Dispatching	Parallel sub-agent work
Collaboration	Code Review	Two-layer code review
Collaboration	Git Worktree Management	Worktree create/cleanup
Collaboration	Subagent-Driven Development	PM-and-developer split
Meta	Writing Skills	Skill creation/edit framework
Meta	Superpowers Introduction	System intro

Every skill is a markdown file. Think of each one as a reusable knowledge module that documents a procedure, best practices, and the workflow.

Building your own skills

The genuinely powerful piece is the Writing Skills meta-skill. You can update existing skills or create new ones from scratch.

Usage is simple. You tell Claude something like this.

"Make me a skill that enforces our company's coding conventions"

And Claude builds the skill file for you. When it creates a new skill, it applies the same TDD, sub-agent dispatch, and pressure-test routines as any other dev work. Building a skill itself follows the Superpowers workflow.

The fun part is that you can also hand it a programming-book PDF and ask it to "read this and turn what you learned into a skill." Whether it's clean-code principles or a specific architectural pattern, anything documented can become a skill.

Real-world cases

Where Superpowers earns its keep is in the field. A few examples.

1. Next.js 16 migration: a 500-line plan, generated for you

A developer handed Superpowers the job of upgrading his service to Next.js 16 and turning on cacheComponents. He ran /superpowers:write-plan, and it produced a 500-line migration plan.

A full list of the 23 API route files that needed changes
The 2 components where new Date() would break prerendering, identified
The context providers that needed a Suspense boundary, called out
A 4-day timeline with test checkpoints at each stage

Doing that codebase scan by hand would've eaten more than a day.

2. Notion clone: 45 minutes, zero hand-written code

A Notion-style web app (rich-text editor, interactive tables, drag-and-drop kanban board) built end to end with Superpowers. The numbers were striking.

Item	Result
Build time	45 to 60 minutes (mostly automated)
Test coverage	87% (unit, integration, E2E)
Hand-written code	0 lines
Features included	CRUD tables, kanban, rich text, auth

3. Authentication system as a skill: 14 hours saved across 6 projects

A developer who'd been rebuilding the same auth system (OAuth, session management, token refresh) across project after project turned it into a skill via Writing Skills. He fed his existing codebase in, got a 302-line implementation guide and ASCII wireframes for every screen, and had a reusable skill in about 20 minutes. After that, dropping /authentication-setup into a new project deployed the entire auth system in one line, and he shipped it across 6 projects, saving 14 hours total. He also kept 100% consistency.

4. Channel Talk integration skill: 3 days down to 30 minutes

Someone fed the entire Channel Talk official documentation into Claude and used Writing Skills to build a Channel Talk integration skill. Once installed, you just say "set up Channel Talk integration" and it handles the SDK install, bot configuration, webhook wiring, and lift-and-shift tasks end to end. Work that used to take 3 days now finishes in 30 minutes.

The real value of Superpowers is that it gets stronger the more skills you stack on top of it. Build a library of skills tailored to a project, a team, or a company, and that library becomes the team's living development knowledge base.

Why it actually works: persuasion psychology + pressure tests

Skills hardened by pressure scenarios

The author of Superpowers didn't just write the skills and call it a day. He stress-tested them with extreme pressure scenarios, like production server down, $5,000 in losses per minute.

The reason is that under pressure, Claude has a real tendency to skip the skills and start coding. When that happened, the author would mark the test as a fail, harden the skill, and run it again. He kept iterating.

So the Superpowers skills aren't just clever prompts. They're a framework that's been tested in the field. Even in a fire drill, the agent stays on the "questions first, plan first" rails.

The Superpowers philosophy

Here are the core principles Superpowers is built on.

Test-first development (tests come first, implementation second)
Systematic processes over intuition (process beats gut feel)
Simplicity as primary objective (simple is the goal)
Empirical verification before success declarations (actually verify before saying "done")

It comes down to forcing good engineering habits onto the AI.

Practical tips

Token usage heads-up

Superpowers iterates deeply during planning, so it burns tokens. I'd recommend the Claude Max plan. It can be overkill for trivial fixes, so pick your moments.

When it's a good fit

Situation	Fit	Why
Starting a new MVP project	High	You get the full design-to-build flow
Adding a complex feature	High	Parallel sub-agent dev plus TDD pays off
Whole-codebase refactor	High	Structured planning and verification matter
Quick bug fix	Low	Overkill, direct edit is faster
One-line config change	Low	Skip Superpowers

Overnight autonomous work

If you set up a sizable plan, the agent can run on its own for hours. Approve the plan at night, set it loose, and you can wake up to a finished PR.

Closing

Here's Superpowers in one sentence. It's the framework that bundles the interview command, the TDD skill, and the code-review skill I'd been building one by one, then adds sub-agent parallel development and Git worktree management on top. And it ships with persuasion psychology layered in to make sure the AI actually sticks to the workflow.

Core feature	Description
Auto brainstorming	Starts with questions, no command needed
7-stage workflow	Design -> plan -> TDD -> review, automatic
Sub-agents	PM-and-developer split, parallel execution
TDD enforced	RED-GREEN-REFACTOR cycle
Verification built in	Blocks the "all done" lie
Writing Skills	Build your own custom skills
Persuasion-based	Locks in skill compliance

Setup is two lines, so the first move is to install it and try it. You don't need to use it perfectly out of the gate. Install Superpowers, start working the way you normally do, and Claude will start asking the questions on its own. Follow that flow and you'll find your own rhythm with it.

And if you keep adding your own skills through Writing Skills, the setup keeps getting stronger. I'm still adding skills to mine as I go.

"a complete software development workflow for your coding agents, built on top of a set of composable 'skills.'"

-- Superpowers GitHub README

Go install Superpowers on Claude Code right now and try it on your next project.

Wings included.

References

Reading 'The Obstacle is the Way' (with a side of Meditations)

Tony Cho (https://flowkater.io) — Tue, 03 Feb 2026 01:00:00 GMT

This review is from the readmeleadme (or "Lit-mi Lit-mi") reading group.

The Obstacle is the Way: The Timeless Art of Turning Trials into Triumph

Ryan Holiday, translated into Korean by Ahn Jong-seol | SimpleLife | November 11, 2024

Original title: The Obstacle Is the Way

Where I'm starting from

This was the book I picked up to clear my head at the start of the new year.

A lot happened last year, and during the brief stretch I'd stopped to rest, the aftermath I'd expected to recover from kept getting worse instead. Body and mind both went through a hard time. What saved me, though, was already knowing this: the moment I'd changed my environment, this was an inner problem of mine, not a problem with anything else.

It was the same as 2020, when I realized I had to wind down the company. I figured it would take some time, but I expected it to be a little faster than last time. Somehow, it lasted longer. I avoided looking inward and turned my attention to other things, to anything outside.

Thankfully, starting in December, I dragged a body that had been carrying injuries and inflammation back upright and started exercising again. Not like before, but a little at a time. I tightened the life balance I'd let collapse under the excuse of "rest", and oddly, I came back more energetic than I had been, able to focus longer, getting my routines back. The inner repair started.

To inherit the essence of Stoic philosophy, to make calm rational judgments, to not be afraid of pushing forward again after any failure or setback (to actually have "breakthrough power" inside me), maybe it just needed some time. A life without failure is a life without challenge, and there's nothing as addictive and as harmful as falling into self-justification and self-pity.

I spent enough time in self-justification and self-pity to learn this: all of it was me being addicted to my own feelings and thrashing around in them. When I finally looked at myself with reason, none of it was anything. It was already over.

The version of me who was supposed to take it all on the chin and keep going was nowhere to be found. I'd ended up in a loop of self-justification, self-pity, and blaming other people, hating myself for it, then doing it again. That stretch is finally a little settled now.

Marcus Aurelius (16th emperor of Rome, one of the Five Good Emperors, the philosopher-king).

In this moment, judge objectively. In this moment, act with devotion. In this moment, willingly accept whatever happens. That's all that's needed.

Marcus Aurelius (translated by author)

The thing I loved most while reading The Obstacle is the Way was this: there is no past, no future. There's only the present, given to everyone equally. And every good thing and bad thing that happens to us, the universe makes no judgment about any of it. These are the points Marcus Aurelius keeps hammering on in Meditations, which I'm reading alongside this book.

The book's message is simple. In spite of everything that happens to you, you have to stand back up and keep going. He doesn't cheer you on with "you can do it". He says you should do it, that not doing it isn't an option. Do your duty, he says, with no softness in the voice.

The Obstacle is the Way: perception, action, will

The book lays out three stages.

The starting point is the stage where we look honestly at our problems, our attitudes, our ways of approaching things (perception). Next is the stage where we use heat and creativity to actually clear the obstacle and create opportunity (action). The last is the stage where we cultivate and sustain the inner self that can deal with repeated failure and difficulty (will).

The three axes are tightly linked. Without perception, you can't act correctly. Without will, you can't endure repeated failure. Through this framework, Ryan Holiday gives Stoic philosophy a modern reading.

Perception: see it objectively

Hold an objective view. Control your emotions and don't lose your sense of balance. Work to find the positive elements. Don't get excited or rattled. Focus on what you can control.

The core of Holiday's perception is separating emotion from fact. He uses George Clooney as an example.

From then on, Clooney started bringing this perspective into auditions. Instead of just pushing his acting ability, he explained why he should be the one cast for this role.

Clooney redefined an audition from "a place where I get judged" to "a place where I go to solve a problem". A shift in viewpoint changed the reality. Holiday organizes this into two concepts.

Context: a sense of the bigger picture, looking at the world as a whole, not just what's right in front of you.

Frame: your own way of looking at the world and the events inside it.

Looking back, I'd poured way too much energy into things I couldn't control. Regret about what was done, anxiety about what hadn't come yet. The things I could actually do right now? I looked away from those.

What you think about often, that's what you become.

This line landed especially hard. While I was thrashing in the swamp of self-pity, I was becoming the pity itself. I was repeating the same thoughts every day, and those thoughts were defining me. If thoughts make me, what am I supposed to be thinking?

The way we see the world depends on the way we look at things like this. Does our perspective actually give us 'perspective', or is it the very thing causing the problem?

Holiday asks: is perspective the solution, or is it the problem? In my case, perspective itself was the problem. I couldn't see the situation objectively, and the emotion warped everything.

Don't forget that inside every obstacle is a hidden opportunity to make a better reality.

A shift in how you look at failure. This isn't about forcing yourself to think positive. It's about looking at the situation honestly, and finding what you can learn inside it.

Action: just start

The one thing all fools have in common is that they're always getting ready to start.

Seneca

How much "preparation" did I do during my time off? I'll start once I recover. I'll start once I feel better. I'll move once my head is clear. Endless preparation. And while I prepared, time kept moving.

Holiday hits this attitude head-on.

We often live in this delusion that the world will accommodate us. So at the moment we should be acting, we dawdle. We jog along leisurely when we should be sprinting at full tilt.

It wasn't that this line stung. It was that I saw myself in it exactly. I knew the situation was urgent and I was strolling like it was a Sunday jog. While I dawdled, the problem grew, and the opportunity walked away.

The truth is, after I closed the company in 2020, I told myself I'd never start one again. And here I am running a company. My wife and I are building a new product together. The past failure had stayed with me as a kind of trauma, and for a while I think I was just "getting ready". Once it's perfectly prepared, once the market is certain, once I have the confidence. That day never came.

In the end, I just started. Not perfect, not certain, no confidence. As Holiday puts it, "I'll do it tomorrow" is the most cunning lie. Tomorrow doesn't come. When today's tomorrow becomes today, another tomorrow shows up.

Don't wait to live properly. Life slips by in the meantime.

Only action changes reality. Thinking alone, planning alone, resolving alone, none of it changes anything. Action doesn't have to be perfect. You just don't stop.

Will: accept the failure

Remember that the world is always trying to send you a clear message about the failures and actions you've committed. It's a kind of feedback. It's like an exact instruction manual for how you can grow from here.

Failure isn't the end. It's feedback. That difference in perspective changes everything. If you fear failure, you stop trying at all. If you take failure as feedback, you can try more.

Holiday explains the essence of will this way.

There have always been people who turn adversity into opportunity. They never give up. They don't fall into self-pity. They don't fool themselves with the fantasy that an easy answer is about to appear. They focus on the one necessary thing: staying alive and creative until the end.

"They don't fall into self-pity" is the line that stuck. During my time off, I'd become a professional at self-pity. Holiday hits this attitude head-on.

Instead of acting like Demosthenes, you sink into helpless, hollow thinking and turn away from the chance to grow another step. Even when there's a clear opportunity to find the problem and find the solution, you waste weeks, months, even years.

A line from Nietzsche comes back to me.

What doesn't kill me makes me stronger.

You shouldn't misread that line, though. The pain itself doesn't make us stronger. How we receive the pain, how we walk through it, that decides us. Holiday underlines this too.

No, the problem is exactly as hard as we think it is. The worst thing that can happen isn't the event itself, but losing your reason because of it.

Remember that this moment isn't your life. It's just one moment of your life.

The hardness of right now feels like it'll last forever. It won't. This passes too. Holiday also talks about the art of acquiescence.

Stoic philosophy gave this attitude a beautiful name. They called it 'the art of acquiescence'.

Accept what you can't change, focus on what you can. That's the essence of will.

Meditations: live in the present

Reading The Obstacle is the Way led me back to Meditations. If Holiday is the modern reading of Stoic philosophy, Marcus Aurelius is the source. It's stunning that words written by a Roman emperor almost two thousand years ago still hold up.

Get up and do your duty

When the day breaks and you don't want to get out of bed, think to yourself: "I am rising to do what a human being is here to do. I was born for this work, I came into the world for this work, and I'm still complaining and resenting it? I wasn't born to lie under the covers and enjoy the warmth."

There are mornings when I open my eyes and don't want to get up. Honestly, those mornings happened a lot more than the other kind. And Marcus Aurelius was the emperor of Rome. He didn't want to get up either. He got up anyway. He had work to do.

He's unbending about duty.

In every moment, behave as a Roman and as a man, with unornamented dignity, with comradeship, with independence, with a sense of justice. Carry out the task in front of you accurately, carefully, without selfishness. Throw away every other distraction.

The word "duty" can sound heavy. But think about it: the fact that there's something I have to do is itself proof I'm alive.

Every time you do something, do it as if it were the last thing you'll do on this earth. Don't deliberately step outside the control of reason and follow your emotions. Don't get caught by hypocrisy, selfishness, or resentment about the fate given to you. If you can do that, you can do anything.

Marcus stresses "not following your emotions willfully". Acting under the control of reason. That's the core of Stoic philosophy.

What is coming is on its way

Don't live as though you'll live a thousand years. What's coming is already coming for you. While you're alive, do your best to be a good person.

You probably won't have time to read the excerpts you've been collecting. So if there's something that worries you, while time still allows, throw away every other empty hope, and pour all your strength into completing that one goal. Save yourself.

Memento mori. Remember death. This isn't a pessimistic thought. It's the opposite. Because time is limited, this moment matters. The books I've stacked up to read someday, the things I've put off doing someday. "Someday" might not arrive.

Marcus organizes the insight about time like this.

Even if you could live three thousand or thirty thousand years, remember that what passes is only the life you're living right now. You don't live any life other than the life that is passing.

No one can take from you what you don't possess, and every person possesses only this present moment, the same way. Whoever you are, you only ever lose the present moment.

What we have is only the present. The past is already gone, the future hasn't come yet. The thing we can lose, the thing we can possess, is only the present.

Remember how many chances the gods have given you, and how you've never accepted a single one. Remember how long you've been putting it off.

Reading this line, I looked back on the past year. How many chances I'd put off, looked away from, pushed to later.

Fortune, not misfortune

Don't say, "this happened to me, what a misfortune." Say instead, "this happened to me, and yet I'm not destroyed by what happened, I'm not afraid of what's coming, I haven't been damaged at all. What a piece of fortune that is."

Marcus puts this perspective even more crisply.

"This is not misfortune. The fortune is that I keep my own nature even when these things happen to me."

The first time I read this passage, I had a hard time accepting it. While you're going through something hard, saying "this is fortune" isn't easy. But thinking about it carefully, the fact that I made it here is itself the proof. I wasn't destroyed. I'm still here. I'm writing.

Marcus explains the real meaning of "everything is in the way you think about it".

"Everything is in how you think about it" means that external events and circumstances (the value-neutral things that have nothing to do with happiness or with good and evil) get their character and influence not from the events themselves but only from how a person receives them.

The situation itself is value-neutral. Whether to see it as misfortune or as a chance to grow is entirely my choice.

Stand on your own

Don't lose your cheerfulness, do it without outside help, by your own strength. Push away the comfort other people offer and stand on your own. You have to stand straight on your own. Don't get propped up by another person's help, and don't let anyone else stand you up straight.

In the end, the only one who can stand me back up is me. Blaming the environment, blaming other people, the same infinite loop. While that time was passing, the things I actually could do, I didn't do.

Marcus also talks about the retreat into yourself.

Whenever you want, you can withdraw into yourself and rest. There's no better place to be free of every worry and every concern, no quieter or more peaceful place, than your own mind.

Don't look for peace outside. Find it inside. Even rest happens inside you.

Don't see something the way someone forces you to see it, or the way someone wants you to see it. See everything as it is.

The person who doesn't worry about what others say, do, or think, and only cares about getting their own words and conduct right, has a peaceful and abundant mind.

Stepping out of the gaze of others and walking your own path. As emperor, Marcus must have lived under enormous expectation and criticism, and he still got to this place. That's the part that astonishes me.

Wrapping up

The Obstacle is the Way and Meditations. Two books from very different times, telling the same story.

Don't get stuck on the past. Don't be afraid of the future. Focus on this moment. Take failure as feedback. And go forward without flinching.

A few failures aren't a reason to stop. The opposite, actually. Try more, fail more, learn more. A life without failure is a life without challenge.

After I closed the company in 2020, I thought I'd never start one again. And here I am running a company again. The past failure didn't disappear. I just learned that it isn't a reason to block today.

It took a long time to climb out of the loop of self-justification and self-pity. But I think that time was needed. At least now I know. Being swept by an emotion and feeling an emotion are different things. Feel the emotion, but don't get addicted to it. Maybe that's what Stoic acceptance is.

What I learned	Action item
Thoughts make me	Trade self-pity for objective observation
Just preparing is doing nothing	Execute one small action you can do today
Failure is feedback	After failing, write down the lesson instead of blame
Only the present is ours	Stop spending time on past regret and future anxiety

Don't live as though you'll live a thousand years. What's coming is already coming for you.

Marcus Aurelius

I'll focus on the present. Do today's work. That's all there is.

Questions for the reading group

1. What was your biggest recent failure, and how did you "break through" it? What are you doing now because of it? Share how you got past it and how your perspective shifted.

The biggest failure was a failure of the heart. Last year my body and mind were both in bad shape, so I fell into perfectionism and kept delaying the project I was on. I chewed on past regrets and failures over and over. In particular, I realized the way I'd wound down the team was a failure, and I couldn't get out of that for a long stretch. The environment was hard, sure, but I kept thinking I could have closed it out better. The point, though, was that I kept chewing on the failure long after I'd left it. I wrote about this in the review too, and there's a similar memory from when I closed Todait. Coming out of an organization without going into a new one for a long time made me sink into the failure and the self-pity more and more.

The way I'm getting past it is writing. Recently I've been writing blog posts and even fiction, all of it for the sake of "breaking through" and getting out. The ideas that had been floating around in my head and chest got concrete on the page, and as they got concrete my heart emptied out a little, and I started to see what to do next. The knots inside me feel like they're being released. I wish I'd written sooner. Now that I've written some, I feel like it's time to read again and refill. The line "you have to empty out before you can fill up" really resonates with me these days.

2. What was the biggest change in your thinking from reading this book?

I'm someone whose views were already mostly in line with the book, so it wasn't so much a shift as it was reinforcement. The line that's staying with me is that the constraints of reality can themselves be the path.

3. Push when you should rest, and burnout comes. Rest when you should push, and you stagnate. How do you tell the difference?

I clearly did the opposite. What matters in the end is the result, the outcome. If something feels like it's dragging on too long without a clear result, it's time to revisit the direction. Rather than just deciding whether to rest or not, I think it's important to keep a daily rhythm, hold the routines, focus harder when needed, and keep enough room in your mind to look back. That said, if I'm just tired, pushing through is fine. If the environment isn't going to change no matter what I do, a change of direction is the thing to do.

4. Pick two passages you underlined. Why those two?

What doesn't kill me makes me stronger. (...) No, the problem is exactly as hard as we think it is. The worst thing that can happen isn't the event itself, but losing your reason because of it.

There's a scene in Evan Almighty where the family is leaving Evan, who's suddenly building an ark in modern times. There's a conversation between Morgan Freeman as God and Evan's wife at a diner that I still remember.

If someone prays for patience, do you think God gives them patience, or does He give them the opportunity to be patient?

If they pray for courage, does God give them courage, or does He give them the opportunity to be courageous?

If someone prays for their family to be closer, do you think God just zaps them with warm, fuzzy feelings, or does He give them opportunities to love each other?

The family eventually returns to Evan, the flood actually arrives, and the biblical scene plays out. It's a movie that could come across as a little corny and easy to dismiss, but as someone who isn't a Christian, that scene hit me hard, and it kept coming back to me as I read this book.

In this moment, judge objectively. In this moment, act with devotion. In this moment, willingly accept whatever happens. That's all that's needed.

I opened the post with this line, and the Stoic view that everything in the world is value-neutral, and that what matters is reason and doing your duty, is the line that helps me hold my emotional energy steady, instead of getting destabilized by small things or overly excited. The value I keep coming back to is: focus on this moment, above all.

In Peaceful Warrior, another movie I love, the mentor "Socrates" tells the protagonist:

A real warrior doesn't give up what they love. Instead, they find the reason to love what they're doing right now.

5. In a moment when control is completely gone, what attitude can you still hold?

What the book is finally saying is that my thoughts, my emotions, my judgments are all things I decide. It's hard, but how I think in such a moment, how I feel, how I judge, those are still things I can choose. Of course, I don't think you should drive yourself when you're already collapsed (you need time to digest the hard parts too). Even so, holding your mental ground, judging with reason, and doing what you can do is the attitude that really matters.

6. Did this book make you newly aware of how you handle problems?

For me, reasoning through work is relatively easy. With people, or with ordinary daily things, I get emotionally sensitive and irritable easily. Over nothing, really. Paradoxically, I steady myself for the big things and accept them with grit, but I crumble easily on small things. So the various ways the book talks about handling problems aren't only for our big goals. They're also for keeping daily peace, and I came away resolving to give myself a little more room.

7. (The book title in Korean is a translation, but the translator chose it for a reason.) What does the word "breakthrough" mean to me?

Life is hard. Or maybe it's hard because we think it's hard. The word "breakthrough" is a bit corny, but maybe you need that level of resolve to get through the many problems we face with a little less shaking and a little less collapsing. Through this book, I hope all of us can find the courage and strength to "break through", whatever hardship and difficulty come.

8. Are you results-oriented or process-oriented? How do you balance them?

I used to be process-oriented, and now I'm fully results-oriented. From running a company and a team, and from failing, what I've felt is that whatever the process was, the evaluation of that process changes completely depending on the result. I don't think the process is unimportant. The point is that results are at the center.

In the past, when I was process-oriented, there were times when the results were mediocre and we'd awkwardly applaud each other. I thought that was support, that we could keep going. But "as long as you tried hard in the process, the result doesn't matter" really turns people passive and irresponsible. I learned that the hard way, both with myself and when running a team. A post I wrote earlier, No Victory, No Future, is in the same line of thought.

The conclusion is that being results-oriented is right, no question. But you should use the result to do the process better, not let "as long as the result is good, the process can be anything" take over.

9. When things don't go to plan, what do you do? Adjust or scrap the plan? Or push through even if it's late?

In the past, I'd adjust or scrap a lot. The principle, though, is that a plan is also part of the process, like in question 8. So I'd weigh whether the plan really matched the result and whether the cost was justified. That's the usual call in business. Personally, when I'm working on my own tasks or studying, I tend to just push it through and finish. Unless the cost is enormous, what you learn from finishing is more than what you learn from quitting halfway.

10. When you face a book that's hard to read, what do you do? (Just push through, reread, etc.) And how do you decide a book is "hard"?

Mostly I just push through. I don't really get obsessive about it. There are plenty of books to read, and sometimes after time passes, what I read before just clicks. It's not exam prep where you have to memorize everything, so even with a great book, if it's hard for me, I won't reread it. What I do try is finishing it. A kind of reading-comprehension training, you could say. Once you survive that hardship, other books feel really easy. These days, thanks to e-books, I read three at the same time: an easy one (usually fiction), a medium one, and a hard one. I rotate. The hard one I read first, only the minimum amount per day. The medium one a little more than the set amount. The easy one I sometimes binge, sometimes skip. Reading just one book at a time gets boring, and rotating is nice.

That said, when I really have to read something hard, using a reading group like this one is the best move. Even if I read loosely, I can absorb the book indirectly through other people's thoughts and reactions. Honestly, I find the thoughts of the people that the book brings out and clarifies more interesting than the book itself.

There's No Future for an Organization That Can't Win

Tony Cho (https://flowkater.io) — Sun, 25 Jan 2026 00:00:00 GMT

How this started

"And then I realized it. Oh, I'm too good at talking. So even with the same idea, I get the team excited about it, and that made us spend years investing in wrong ideas, and I'd be making bad calls while still thinking I was right."

When I first came across this interview with Lee Seung-gun, the CEO of Toss (Korea's leading fintech), I just stared at the screen for a long while. Looking back, I was that kind of leader too. He goes on: "If you don't succeed, in the end it's a really bad experience for the team." And he adds: "Saying the painful but necessary thing is how a person and a company grow. It took me about five years to accept that."

Reading these lines hit me hard. It sounded less like a successful entrepreneur's reflection and more like an honest confession from a leader who'd been through countless failures. We tend to fall into the trap of "let's just keep things nice." Because we love the team, because we don't want them to get hurt, because we don't want to be the bad guy, we hold back the painful words. And what's the result? The organization slowly sinks, and the people who got on the boat with us scatter, carrying nothing but the memory of failure. Time spent in an organization that never wins, no matter how warm the process felt, ends up as poison.

When I think about it, why did we gather in the first place? For social bonding? To collect a paycheck? No. We came together to achieve something, in other words, to win. But at some point we forgot about winning. We seem to have even forgotten how to define it. Watching all those nights and all that passion evaporate into the single word "failure" is a sad thing.

The paradox of the AI era

It's the AI era. Coding has AI helping out, design mockups pour out in seconds. Data analysis no longer takes days the way it used to. The tooling has leaped forward. Productivity, in theory, should be tens of times higher. But something's strange. Why are the essential problems organizations face the same as they were ten years ago?

Technology changes at the speed of light, but the way people work together is still stuck in stone-age inertia. Better tools don't make organizations smarter. If anything, the convenience of better tools makes it easier to hide behind them and ignore the real flaws in communication and decision-making. Tens of thousands of Slack messages fly back and forth, but there is zero shared agreement on "where are we actually going right now?"

What it comes down to is this: the problem isn't the tool. It's people and systems. AI can write code for us, but it won't define which "win" our organization absolutely has to take this quarter. That's entirely on us. Yet instead of doing the defining work, we waste energy bringing in flashier tools and building more complicated processes.

I did the same thing. Early on, I genuinely believed adopting a new tool would solve the organization's problems. I thought Notion would fix information sharing, Jira would make projects run cleanly. It didn't take long to see how naive that was. Tools don't solve problems. They just expose the hidden ones more clearly.

The trap of the leader going it alone

Let's go back to Lee Seung-gun's confession. The leader's biggest enemy isn't an external competitor. It's their own way with words and their own conviction. When the leader speaks too well, the team stops thinking critically. When the leader's energy runs too hot, the whole organization sprints at full speed in the wrong direction.

This isn't a problem only at the C-level. For senior engineers and lead designers too, the moment any of them thinks "I'm right," they've already walked into the trap. "I've worked in this field for ten years," "Last time I succeeded with this approach" (the curse of experience covers the organization's eyes).

I fell into that mode plenty of times during my CTO years. Thinking back makes my face burn. The one I remember most was when we were designing a new system. I was certain microservices were the right answer. One of the engineers asked, "At our current scale, do we really need to bring in MSA? Wouldn't going monolith and moving fast be better?" I dismissed the question as a lack of technical understanding. (It's genuinely embarrassing now.) I believed the architecture I'd designed was the answer. We invested six months building a complex system and couldn't even properly launch a single feature the customers wanted. That engineer was right. What we needed wasn't elegant architecture, it was fast execution.

My way with words may have gotten the team excited, but the result was leading them onto the wrong battlefield. The leader going it alone becomes the ceiling that suppresses the organization's potential. A study of overseas startup failures keeps pointing to the same line: "When the leader becomes the primary doer or the ultimate reviewer, they inadvertently cap the organization's potential." The moment the leader decides everything and reviews everything, the organization's intelligence collapses into the leader's intelligence alone. That's not what a winning organization looks like.

The team that only does what it's told

The opposite case exists too. The leader has lost direction, and the team is in pure "do-only-what-you're-told" mode. I call this the "n-people-equals-1" problem. Ten people gather, and instead of multiplying their output, the total doesn't even add up to one person's worth. Each person handles the ticket on their plate cleanly, but no one cares what value those tickets add up to.

As a middle manager I once hit a goal on time with my team. We built what we'd planned within the deadline, shipped it, and got the result we'd targeted. But honestly, looking back, that project contributed exactly zero to the company's growth. The goal itself was wrong, or it was disconnected from what the market needed. And I took shelter in the relief of "I did my part." It was a cowardly relief. That same relief never translated into any reward for the teammates who'd ground through implementing features without even knowing why, and the team's mood was completely shattered. It was a tragedy.

If you compare this kind of organization to a sports team, it's like one where the defenders only guard their zone and the strikers only wait for the ball to arrive. The game is won by scoring goals, but no one runs toward the goal. They just stand wherever the coach told them to stand.

"If scoring is the goal, I want to decide where to run myself."

This isn't arrogance. It's a hunger for victory. A player who really wants to win reads the flow of the match, finds the open space, and moves on their own. The coach's tactics are a guide, but the final call on the field has to belong to the player. Yet many organizations strip teammates of that judgment. A single line, "Just do what you're told," tramples the chance of winning.

Five years of respecting the team and failing anyway

So if you respect the team without limit and give them full autonomy, do you win? The first five years of Toss show the counter-example. Lee Seung-gun says he wanted to be a "really good CEO" early on. He was so grateful to the team that he couldn't be tough with them, and he kept giving them more chances. The result? Five straight years of failure.

The moment the failure was confirmed, those "good experiences" instantly turned into "memories I want to delete." The team responded coldly: "The time I worked with you is something I want to erase from my life." It hurts, but that's the reality. In an organization that's gotten used to losing, "kindness" is just another word for irresponsibility.

I made a similar mistake when I was a leader. One of my teammates was running a project that kept slipping. The teammate had the skills but was struggling to set priorities. I could see it. But I held back sharp feedback because I worried about hurting the team's mood, because the teammate looked exhausted. I said, "It's okay, take your time." Even when failure was clearly coming, I kept floating baseless optimism that "we're doing great."

Three months later the project got shut down. As that teammate left, they said this: "I wish you'd told me earlier. I didn't know what I was doing wrong." That's when it hit me. The way a leader truly respects the team isn't by making them feel good, it's by winning together with them. Real respect sometimes requires the courage to say the painful truth.

What is winning

So what is this "win" we should be hungering for? Just hitting a revenue target? Going public? Sure, those can be metrics of winning. But the essential win is "every person in the organization feeling the win in the result."

It isn't simply filling in a target number. It's the state where the product we built actually changes a customer's life when it goes out into the world, and through that process all of us share the overwhelming sense that "we pulled it off." That's the real win.

Winning is addictive. An organization that's tasted winning once moves on its own toward the next one. An organization that's gotten used to losing, on the other hand, gets used to blaming external causes and pointing fingers at each other. Winning is the best medicine for every conflict in an organization. Winning doesn't make every problem disappear, of course, but at least it generates the energy to solve them. Winning is the blood that runs through an organization. The way blood has to circulate for life to continue, an organization needs winning to breathe.

The real relationship between coach and player

A winning organization looks a lot like a well-trained elite sports team. Think about the head coach of a soccer team. The coach analyzes the opponent and builds tactics that play to our strengths. But once the match starts, the coach can't run onto the field.

A player who has the ball on the field has to decide in 0.1 seconds. Pass, dribble, or shoot? If that player hesitates because they're checking with the coach on the bench, the team will never win.

Winning teams have clear traits.

Shared goal: Every player is aligned on "we win today, no matter what."

Role expertise: The keeper stops goals, the striker scores. They respect each other's territory, but they help when help is needed.

Autonomy on the ground: Within the broad frame of the tactics, the player's creative judgment is respected to the extreme.

What about our organization? Is the leader stepping onto the field and grabbing the players' feet? Or are the players standing around in a daze, not even knowing where the ball is? A winning team moves organically without the coach's instructions. That's the power of a system.

What a PM is actually for

The PM (Product Manager) role matters here. Many organizations treat the PM as a "requirements messenger" or a "schedule keeper." But in a winning organization, the PM has to be a "win designer."

A PM isn't someone who lists features. A PM has to prove "if we build this feature, why does it make us win?" Between technical constraints, business goals, and customer needs, a PM has to find the equation for victory.

I felt this in my bones when I took on the PM role. If the PM can't define what winning is, the engineers and designers end up grinding through meaningless labor. A PM isn't someone who tells the team "what to build." A PM has to be someone who convinces the team "why we have to win this fight." A PM's strongest weapon isn't data or a spec doc, it's the conviction that they can lead the team to a win.

The courage to give things up

One thing organizations that don't win have in common is "trying to do everything." Everything matters, nothing can be cut. So resources scatter, and they don't win on any battlefield.

To win, you have to know how to give things up. Focus on one goal, and you can have a clear experience of winning or losing. Mess around with several at once and fail, and you can't even tell why you failed. You just hide behind the cowardly excuse, "we failed because we didn't have enough resources."

One of the most painful experiences I had was this. We were running three new projects at the same time. Headcount was limited, and I judged that all three were important. To be honest, I was afraid to decide which one to give up. (I didn't know back then that postponing a decision is also a decision.) The result? All three fizzled out partway through. If we'd gone all-in on one, at least one would have made it, and that was the price of being greedy. That's when I learned. You have to be able to give things up to focus, and you have to focus to win. One of the leader's most important jobs is deciding "what we will not do." It's genuinely hard. But it has to be done. Giving up isn't losing. It's a strategic retreat for a bigger win.

The signs of an organization falling apart

An organization without winning collapses slowly, but with certainty. The signs show up in three ways.

First comes talent drain. Talented people have an uncanny nose for the smell of victory. In an organization where no win is in sight, they pack their bags first. The moment they leave, the skills and strategy leave with them. A talented person constantly asks themselves, "Can I grow in this organization?" They know instinctively that growth isn't possible without winning. Talented people don't follow money, they follow the experience of winning.

Second comes the leader's isolation. With no winning, the leader's words lose force. The team starts treating the leader's vision as an empty shout. The leader reaches for more coercive measures, which triggers more pushback, and the loop tightens.

Third comes cultural decay. In the empty space left by a shared goal called winning, politics and cynicism move in. The mindset that "it won't work anyway, just do it casually" takes over. Cynicism is like a cancer cell eating away at the organization.

Look at outside cases. Eighty-two percent of startups fail not because they ran out of money but because of bad management and leadership. Winning organizations are different. Georgia's TBC Bank used OKRs to align a 1,200-person organization on one goal and became the best digital bank of 2024. In Korea, Coupang set the audacious goal of "Rocket Delivery," measured it in real time, and seized the win. After Kakao adopted OKRs, project speed went up 1.5x, and Baemin (Korea's top food-delivery app, run by Woowa Brothers) clarified the purpose of cross-team collaboration and built the foundation for winning. They all "defined" winning and "measured" it. OKRs or anything else, the methodology doesn't matter. What matters is whether you're showing the organization the direction of victory. Or whether the organization feels it's heading in the direction of victory.

In closing

As I close this piece, I ask myself.

"Am I winning right now?"

Or, "Is the organization I belong to running toward winning?"

Winning doesn't arrive by accident. Winning comes at the end of brutal self-honesty, the courage to take painful feedback, and the stubborn will that says "we will win, no matter what."

There may be a counter-argument that the leader has to provide direction. That's right. The leader has to provide direction. The point is that "direction is the leader's, the process belongs to the team." Pointing at the goal is on the leader; how the player breaks through with what dribble belongs to the player.

There's no future for an organization that can't define winning. It just repeats past failures and fades away. But if we start defining winning today, telling each other the painful but necessary words, and getting fully into one goal, the future can change. Winning is a choice. And that choice is in our hands right now.

Practices for winning, and the heart to start again

As I tried to wrap up, I figured the question "so how do we actually do it?" would come, so I'm adding a few concrete practices. These are lessons from running a company as a CEO for five years and from four years inside a CTO seat. Not theory, the kind you learn by getting cracked open on the ground.

The power of 1-on-1s

The bigger the organization gets, the further the leader drifts from the ground. The strongest tool for this is the 1-on-1. It isn't a slot for checking task progress. It has to be the slot where you check what your teammate is worrying about, where they think the bottleneck in the organization is, and whether they're feeling a "win" inside this organization.

I regret nothing more than how I neglected 1-on-1s in the organization I ran. I postponed them biweekly with the excuse of being busy, and they fizzled into monthly. Resentment built up among the team in that gap, and I caught it way too late. A conversation that misses its window comes back later at tens of times the cost. Conversation is the lubricant of an organization. When the lubricant runs dry, the engine burns.

Transparent information sharing

To win, everyone has to be looking at the same map. Information that only the executive team knows, context shared only among leaders, turns the team into "alienated workers." Why we're doing this project, what the company's financial state actually is, what real crisis we're facing — these have to be shared transparently. Information asymmetry breeds distrust, and distrust breaks the will to win.

Retros for failure, celebration for wins

There are plenty of organizations that blame people when things fail, but few that retro them properly. We have to record and share why we failed and what we'll do differently next time. Equally, fully celebrating when we win matters. Small wins stack into a culture of bigger wins. Retros breed wisdom, celebration breeds energy.

Being obsessed with the customer

In the end, every win comes from the customer. Engineers and designers have to hear the customer's voice directly. Writing code only from the doc the PM put together is just "winning by proxy." When the maker witnesses how a single line of their code solved a customer's problem, they experience the real win. The customer is our only North Star.

Psychological safety and a high bar

For ambitious attempts, you need psychological safety, a confidence that mistakes won't get you punished. At the same time, a high bar has to hold. Safe with a low bar, and the organization stagnates. High bar with no safety, and the organization burns out. Balancing the two is the leader's core capability. When you chase a high bar inside a safe environment, the best output shows up.

Ready to start over

Coming back to being someone living a private life — for what it's worth, after running a big organization, and looking back, what's left isn't a flashy tech stack or a grand architecture. What's left is just memory of "people," "winning," and "failure." The colleagues I pulled all-nighters with to solve problems, the moments we shared a beer over the joy of a launch, all of that made me. There were genuinely happy moments, no question.

Even so, all those efforts failed to convert into real business value, real customer value. Eventually, I started judging people's ability and loyalty by how long and how much they worked. Whatever value, vision, or story I tried to share, my words no longer earned their respect. The memory of that "failure" still cuts deep.

Back then I had a lot of urge to defend myself, a lot of bitterness, but now it's time to move forward again.

I'm preparing a new beginning now. With the failures and wins of the past as compost, I'm dreaming of a "winning organization" once more. The person reading this is probably the same. We're all people hungering for victory in our own arenas.

For the time being I'll be focused on a two-person team with Ellie, but if I ever take responsibility again as a middle manager inside another company, I want to be responsible for an organization where autonomy in the process is guaranteed and only the result is accountable, where the direction of victory is communicated clearly to everyone, so that everyone moves toward the goal on their own.

I sincerely hope your organization, your team, and you yourself win today. As for me, I'm just going to keep showing up tomorrow and trying to define one more thing worth winning.

In the end, winning belongs to those who don't give up.

"Winning solves everything."

— Tiger Woods

I'm using this piece as my 2025 retrospective. For me, 2025 was a time of recovery. Maybe close to a sabbatical. There was a piece I'd written long ago and put aside. Too negative, too full of blame, the kind I couldn't bring myself to publish. From where I stand now, I polished and re-polished it, and I'm publishing it as a piece that helps me move forward. I hope it heals you the way it healed me.

I don't have a religion, but if there's a god, I'd want to pray always: "Take from me my anger toward others, let me embrace and understand them, let me face my own interior, and give me the courage to take one more step forward."

MVP in the AI Era: Product Lessons from Linear

Tony Cho (https://flowkater.io) — Tue, 20 Jan 2026 05:30:00 GMT

Opening

Half-baked products die. AI has driven the cost of building down to almost nothing, which means anyone can ship fast, which means what you build is now everything. Product Market Fit has to mean a product that sells without marketing. Is that the level our MVP is at?

Truth is, when I scroll Threads I keep seeing solo developers drowning in marketing. There's an argument going around that "AI handles the build, so marketing is everything," and I sympathize to a point. But lumping it all under "marketing" is a cop-out. The word marketing covers content, performance, sales, and yes, the customer development I'm writing about here.

I learned a lot from studying Linear while I was deep in customer development and MVP work. The lessons are practical enough for early SaaS that I wanted to write them down. The interesting thing is how much of what I'm reading now in Solving Product, plus Linear founder Tuomas Artman's writing, kept overlapping.

The modern MVP is a different animal

"Building something valuable is no longer about validating a novel idea as fast as possible. Instead, the modern MVP exercise is about building a version of an idea that is different from and better than what exists today."

— Tuomas Artman, Linear

Linear's founder is direct about it. The MVP is no longer "validating an idea fast and cheap."

The MVP that Eric Ries defined in The Lean Startup in 2011 was "the version that gets you the most validated learning for the least effort." Back then Airbnb had to validate "would anyone sleep at a stranger's house?" and Lyft had to test "does ride-sharing actually work?" The ideas themselves were brand-new categories.

What about now? Most categories are already saturated. Someone has built it. Someone has built it better. So a new product survives by proving it's better than what's already out there, not by validating that the idea exists.

The gap is enormous. In a validation phase, a rough prototype is fine. When you're competing (and users already know the alternatives), your product has to be substantially better.

"Just ship fast" doesn't cut it anymore. As Linear learned, today's MVP has to be a competitive product, sharpened over time. And to get there, you have to be clear from day one about what you're actually building.

The power of narrowing the target

Linear's strategy was simple.

The company's vision was to "become the standard for how software is built." That's enormous ambition. But if Linear had targeted every developer and every team at the MVP stage, they would have failed. Resources would have been spread thin and feedback would have been all over the place.

So Linear narrowed the target to an extreme. "Individual Contributors at small startups." More specifically: "engineers at small teams who were struggling with issue tracking."

And they focused on three things:

Fast: local storage, no page reloads, works offline
Modern: keyboard shortcuts, command menu, context menus
Multiplayer: real-time sync, teammate presence

The interesting part is that Linear's founders were exactly the ideal customer. Their strategy was "let's build it for ourselves."

I had a similar struggle when I first started a startup. I genuinely did design the early product around a target customer, and we hit decent enough metrics to raise a round. But after the funding came in, growth flatlined, and instead of staying loyal to that core customer, I kept tacking on features that users requested in order to expand the surface. I'd pull all-nighters on a feature and then sigh through the morning. The metrics never moved. The customers didn't actually want it.

Looking back, the first thing I should have done was define "who are the customers we genuinely love?"

There's a catch worth flagging here. Linear's "ICs at small startups" wasn't just a narrow slice. It was a group with strong motivation and high expectations. Translated: people who knew they had a problem and were already looking for a solution.

That's the central lesson in Solving Product too. Find the High-Expectation Customer. The customer for whom your product feels less like an option and more like a lifeline. The kind of customer who, like a patient who can't function without a specific medication, feels "if this disappears, I have a real problem."

Feedback loops and how to use a waitlist

Linear's second lesson was how to use a waitlist.

A lot of startups treat the waitlist as a "marketing channel" or a way to inflate the user count. Linear thought differently. The waitlist was the whetstone for the product.

Concretely, Linear:

Asked specific questions at signup:
- What's the size of your company?
- What's your role?
- What are you using right now?
- What's frustrating about your current tool?
Invited the people who could actually give feedback first. Linear only integrated with GitHub, so they started with founders of small startups who used GitHub.
Listened to the feedback and iterated. They kept polishing the existing features until "new feature requests" tapered off.
Once things stabilized, they invited the next segment.

Why does this matter? Because feedback from the wrong customer breaks your product.

Solving Product warns about this directly. "The average of all feedback leads to a terrible product." That's not a value judgment, it's math. Try to converge a hundred different needs and you end up with the average product. The bland one.

Go the other way and listen only to a tiny, eccentric feedback group? Then you build "a product for very strange people." That's why balance matters.

The most effective approach ends up being:

Selecting early customers with clear criteria (is our target unambiguous?)
Going deep with that group (a lot of people vs. a few you understand intimately)
Acting only on patterns in the feedback (recurring issues, not one-off requests)

Defining and segmenting customers properly is genuinely critical. Like a sculptor chipping away the unnecessary marble to reveal what's essential, even inside a segment you have to keep what's core and ruthlessly remove the rest.

How to find high-expectation customers

So who counts as a "high-expectation customer"?

Tuomas at Linear asked customers directly:

"What would happen if Linear didn't exist?"
"What's the biggest help Linear gives you?"
"How could it get better?"

The "I'd genuinely miss it" customers were the early group.

Flip that around: a customer who used the product because it "seemed convenient" can be dropped from the waitlist. They didn't really commit, and they'll bolt the moment a better alternative shows up. At the PMF stage you need customers who love the product. Not customers who like it.

The obvious question lands here: "If we don't even have a product yet, how do we find customers like that ahead of time?"

The trick is to start from the opportunity, not the product. The product is, in the end, a solution to a customer's problem or desire. Which means even before the product exists, the people who feel that problem urgently are already out there. Find them first.

The flow looks like this. Define the opportunity you're trying to solve. Go find the customers who are actually living through it. Talk to them, understand their context. Then put the MVP in their hands, and that's the moment you find out whether they're really high-expectation customers. Customer comes before product, and opportunity comes before customer.

Solving Product spells it out further. A high-expectation customer is:

Someone who recognizes your product's core advantage
Someone with strong motivation to solve their own problem
Someone with high expectations of the product

Put plainly, "someone for whom this isn't one of several solutions, it's their only hope."

My own experience matched this. The customers who actually paid and stuck around in the early days were almost all the ones saying "if this disappears, I have a real problem." The ones who used it because it "seemed convenient" left the moment a better option showed up.

So the early playbook is:

Find five high-expectation customers
Make those five genuinely happy
Ignore what the other hundred want (early on, only)

That sounds harsh, but it's the only way to raise the odds of success.

I once worked at a company that mostly did B2B SaaS. The CEO was selling to "potential customers," and "potential customers" said they'd buy if "this one feature" existed. We'd burn three weeks of team time building a brand-new feature for a $10/month enterprise customer. At the time I thought that was the right call. MRR impact: 0%. The feature only ever served that one company, and the next sales opportunity never came.

Watching B2B SaaS companies in Korea, the pattern I kept seeing was startups that started out with conviction in their product, then slowly turned into outsourcing shops after raising a round. Chasing (potential) customer needs cost them their identity. Yes, the domestic market is small and the core customer pool is genuinely tiny, but the path to success is to refuse the easy substitutes and push head-on toward satisfying only the customers who actually love your product.

Solving Product hammers the same point. A "potential customer" is not a customer. The customer is the person paying out of their own pocket and using your product. Worth asking sometimes whether we're being yanked around by too many "potential customer" voices. (That includes friends and family if they're not the target.)

What the modern MVP boils down to

Pulling Linear's case together with Solving Product's lessons, here's what the modern MVP looks like:

Building a competitive product, not validating an idea. Sharpening, not just shipping fast.
Narrowing the target to an extreme. Giving up on "for everyone."
Managing the feedback loop strategically. Finding patterns, not collecting opinions.
Centering high-expectation customers. A few who love it beats many who like it.
Built to a level that sells without marketing. PMF is the product as evidence, not the campaign.

The final criterion is the cleanest. If your product genuinely has PMF, word of mouth happens without marketing. The state where high-expectation customers can't stop telling people around them.

The AI era, and the question that's left

The faster development gets, the more what you build matters. Thanks to AI, anyone with a decent idea can ship quickly. Which means the competitive edge is no longer execution speed. It's a sharper target, deeper customer understanding, and the courage to delete fifty percent.

Closing

Honestly, the word "MVP" used to frustrate me. The "minimum" part. But looking back at the last few years, that "minimum" was actually the most aggressive choice you could make.

Following Linear's story, this is what hit me: you have to build a product that sells without marketing.

How is that even possible? The answer is in everything I covered above.

First, abandon the fantasy of "a product for everyone." Instead, find high-expectation customers through extremely narrow targeting. Not trying to clothe everyone, but tailoring a bespoke suit for one person. Picture three to five specific people, get clear on what they actually need. The moment you change the question from "for everyone" to "for this person," your product gets a direction.

The next part is courage. When the feedback floods in, you have to choose. Tune out family and friends. Listen only to actual customers. And only act when the same problem shows up three times or more. One person's request is unique, but a pattern is universal. Holding direction matters more than absorbing every piece of feedback.

The core is this. Find the customers who'd be genuinely disappointed to lose your product, and pour everything into satisfying them perfectly. Other customers come later. In the early days, all your energy goes to those five high-expectation customers. When they say "I'd really miss it," you've found PMF.

The discipline through all of this is choice and focus. You have to be able to ask "what gets cut if we removed it and the core value still lands?" and then delete fifty percent of the features. Like chipping marble away to reveal the statue, the essence emerges as the unnecessary parts get removed. Linear probably wanted to pack in tons of features at first. They removed the ones they didn't need. That became the competitive edge.

The faster AI makes development, the more critical this kind of choice and focus becomes. The technical difficulty is already solved.

In an era where AI is making development cheap, only one thing is left.

"Who are you building this product for?"

It isn't marketing. It isn't scale. What we have to wrestle with is who we're really building this for.

The work in front of you is clear:

Core principle	How to do it
Built to sell without marketing	Sharpen the product into its own evidence
Extremely narrow targeting	Define three to five specific customers
High-expectation-customer focus	Make only the customers who love it perfectly happy
Strategic feedback	Act only on recurring patterns
Courage to delete fifty percent	If it doesn't carry the core value, cut it

"The smallest team with the strongest clarity will always beat the largest team with the most confusion."

Books worth reading alongside

Solving Product (Ravi Mehta, translated by Lee Yong-bin). A concrete unpack of high-expectation customers and opportunity-driven thinking. Half of this post came from here.
Continuous Discovery Habits (Teresa Torres). A framework for turning ongoing customer conversations into a habit. Read it if you want to systematize the feedback loop.

References

Thanks to Ellie. This piece was inspired by the reading notes Ellie prepared for our Solving Product study group. </content> </invoke>

F1 Leadership: What Did James Vowles Actually Do? An Engineer's Take

Tony Cho (https://flowkater.io) — Fri, 09 Jan 2026 11:00:00 GMT

Where this came from

Last year I watched F1 live, week after week, and it was a year I cried, laughed, and found pure joy.

My wife Ellie and I were Williams fans, Carlos Sainz fans more specifically. The handsome face, the kind of warmth everyone responded to, plus the absolutely stunning driving he showed across the 2025 season. It made for a very happy stretch of weekends.

Up through 2024 I was watching Drive to Survive every season without fail. After Williams was sold and James Vowles came in as team principal, I got curious about him. Vowles first showed up on the documentary with the nickname "Data Guy", and I can't forget the interview where Christian Horner, Red Bull's now-fired team principal, sneered that "only rookies talk like that."

Vowles is a veteran of this world. He worked under Toto Wolff at Mercedes for a long stretch (twelve years). After what Williams pulled off in the 2025 season, I started as a Sainz fan and ended the year as a Vowles fan. (I didn't know fan loyalty could shift this fast.)

In 2025 Williams went from the bottom of the midfield to a confirmed P5, and Sainz climbed onto the podium twice. Ellie and I both screamed out loud, live, when Carlos crossed the line in third. I never really cared about sports my whole life, and here I was, fully sucked in.

So I got curious about how Williams Racing, and James Vowles' leadership specifically, actually worked from the inside. I've spent five-plus years as a startup founder and four years leading an R&D division from zero to one. Along the way I've made just about every mistake there is to make, and watching Vowles I kept thinking, "Yeah, I should have done it more like that."

If you take out the racing, an F1 team's structure is genuinely close to an IT startup's. There's a team principal, there's a CTO (in Williams' case). What did James Vowles actually do to this perennial last-place team, even in a season he openly called a write-off, to land where they did? I went looking. This is what I pulled together from the articles, posts, and videos about him.

"A car built in Excel"

The state of Williams the year Vowles took over was genuinely shocking.

The 2024 build process at Williams, including initial work, was managed in Microsoft Excel, with a list of around 20,000 individual parts and components.

(The Race, "The shocking details behind an F1 team's painful revolution")

An F1 car is engineering compressed to its limit. Aerodynamics, automotive engineering, data analytics, all of it. The front wing alone has about 400 distinct parts. And tens of thousands of those parts were being tracked in Excel.

The Excel list was a joke. You couldn't search it, you couldn't update it.

There was no data on part cost, build time, or queue depth. Vowles compared the situation to "the Ming dynasty." The investment that should have happened over the past twenty years simply hadn't.

Reading this, I felt it in my bones. That blank-feeling moment when you walk into a legacy system or an undefined process. Anyone in this industry has been there at least once.

The absence of a system isn't just inefficiency. It becomes a ceiling on growth. Next time I take on a new team, the first thing I want to do is ask, "What's the Excel in our team?" The bottleneck that looks fine from the outside but quietly blocks scale. Finding it and turning it into a real system is job one.

[Source: The Race, "The shocking details behind an F1 team's painful revolution"]

Choosing the painful version of change

Vowles decided to do two things at once. Move from the Excel spreadsheet to a digital system, and at the same time substantially change the car's "technical baseline."

Our chassis went from a few hundred pieces to a few thousand pieces. And that's just one part of the car.

As you'd expect, doing both at the same time was a nightmare. Workers were pulling all-nighters at the factory, and Vowles recalls that even in January the car "still looked like a big bag of parts."

But Vowles says this pain was the necessary kind.

I wanted to push the system to its absolute limit so I could see, in one go, where and how it breaks. This winter is the only winter we'll have to do that.

(Motorsport.com, "Vowles on what Williams F1 has done wrong")

You're running a legacy system, you decide "this can't continue," and you commit to a major refactor. In the short term, dev velocity drops and bugs go up, but you have to take that pain to grow long-term. It's the same dilemma in tech.

It's much harder than greenfield work because something is already running. People are used to it, and changing it doesn't auto-install new habits in their heads. It's not unusual to see a team adopt a new system and then roll it back out of pure pushback. That said, "running" doesn't mean "right." It takes courage to change it, and the kind of leadership that can pull support from above and below. Burning two whole seasons for 2026? Not many people are going to make that call.

The longer you defer change, the higher the bill. When something big is on the table, I want to ask myself, "Is this the only window to do this?" And if the answer is yes, then push the system to its limits and find out where it breaks before it breaks on you. That beats finding out in production.

We're trying so much in setup and on track. Sometimes we go backwards, but that ends up showing me which direction not to go, and pushes me forward.

(Carlos Sainz, 2025 Saudi Arabian GP)

[Source: Motorsport.com, "Vowles on what Williams F1 has done wrong"]

No Blame Culture

Changing systems alone doesn't change a team. What Vowles centered on was culture.

If you work for me, I never want you to hold back on pushing the boundary or developing or innovating because you're afraid of making a mistake or losing your job.

(GPBlog interview)

Vowles brought in a "No Blame Culture." It's the same thing he was known for at Mercedes. Don't hide mistakes, talk about them in the open, learn from them.

A culture of fear creates two specific problems.

One, people only choose what's safe. Instead of pushing outward, they only push to where they feel comfortable.

Two, when something goes wrong, people hide it. They don't come out and say, "I got this wrong, let me explain why I got it wrong."

"The number of times I've failed in my career is enormous. But every one of those failures, when I talked about them openly and handled them properly, made me much stronger. Success doesn't actually make you stronger. It just sits there saying, 'Nice job.'"

Building a no-blame culture is much harder than it sounds. I tried something similar early on, and as the team grew and got busier it got harder and harder to maintain. But reading Vowles, I felt it again. This isn't optional. It's non-negotiable. The team needs an environment where every failure and every mistake becomes a learning loop, and at times I let things slide and at other times I came down too hard. If the team is aligned on a shared goal and that's how you win, then every mistake along the way is just a means of getting to the win.

What this story made me realize is that the organizational setup, the kind that makes it possible for everyone to focus on the goal and the outcome, has to come first. Paradoxically, that's what makes a culture like this stick.

Share your own failures first. The leader has to talk about their own mistakes openly before the team will.
Focus on "how do we prevent this next time" instead of "why did this happen." In retros, point at the system, not the person.
Care about the culture more, not less, when things get busy. Pressure is when culture cracks.

[Source: GPBlog, "Vowles takes Williams by the hand"]

1-on-1s, in the end this is the answer

The part of Vowles' leadership that resonated with me most is his emphasis on communication.

I send an email to the entire factory three times a week, and we have a team meeting after every race. Walking the factory floor in person matters too.

(Monocle, "10 Leadership Lessons from James Vowles")

He sends three all-hands emails a week, runs a team meeting after every race, and walks every corner of the factory. It sounds small, but if you've actually tried it, it's brutal. Especially as the team grows.

Looking back, this is the part I regret most. Early on I did 1-on-1s often and feedback flowed both ways, but as the team scaled it got patchier. Vowles convinced me of what I already half-knew: the best thing you can do is keep showing up for 1-on-1s.

"Care about people genuinely. I feel real gratitude that people give me their time. They could be home with family in that hour, and instead they're with me."

This one stuck. F1 team or IT team, in the end, what's happening is people are spending their time with you. Don't forget to be grateful for that.

Bottom-up feedback is hard to give. It feels harder in Korean org culture specifically. Leaders ask for it, and then look surprised when they actually get it. I think we need to build more systematic ways to make this feedback happen. It's non-negotiable, not optional.

Action items:

A 1-on-1 schedule that's actually fixed. If you push it because you're busy, you'll never do it.
Prepare feedback in advance. At minimum one improvement point per person, prepped before you walk in.
Diversify communication channels. Weekly all-hands email, informal chats, formal meetings, all of them.

The long game

The part of Vowles' leadership that left the strongest impression on me is the long-term view.

I don't believe in any short-termism. I won't move short-term because it doesn't fit Williams and its future.

(Monocle, "10 Leadership Lessons from James Vowles")

He said from the start that 2024 and 2025 would be "sacrificed." It was the investment needed to rebuild the team for the new technical regulations in 2026. He picked future competitiveness over present rank.

I don't want seventh, eighth, ninth. I want 2026 to be good. Meanwhile, others up and down the pit lane are focused on 2024 and 2025.

(Motorsport.com, "Why Vowles believes Williams culture will survive short-term pain")

This decision had the full backing of Dorilton Capital, the team's owner. "Quick fixes look impressive on the surface, but they collapse fast."

Holding a long-term view is genuinely hard. Pressure for present-quarter results, stakeholder expectations, the team's own motivation. Every force in the system pushes you toward the short-term call. But watching Vowles, I came back to this: holding a long-term view requires a clear goal, and shared understanding of that goal.

This team is on the way up, and that flow can never be stopped.

(Carlos Sainz, after the 2025 season finale)

When I had full control at the org level, there were times I didn't articulate the long-term view of the company well enough, and as a middle manager there were times I deferred the responsibility upward. There were definitely places I could have done more. When the long roadmap got broken or interfered with from the outside, I wish I'd been more flexible about resetting and re-controlling it.

[Source: Monocle, "10 Leadership Lessons from James Vowles" / Motorsport.com, "Why Vowles believes Williams culture will survive short-term pain"]

Action items:

Define "what's our goal." A clear long-term goal you won't get knocked off by short-term results.
Share that goal with the whole team. Everyone has to know why we're absorbing this pain right now.
Sell the long-term vision to stakeholders too. Without support from above, no long-term strategy survives.

How he convinced Carlos Sainz

The clearest case study of Vowles' leadership is the Sainz signing.

Vowles spent six months in continuous communication with Carlos Sainz, openly sharing even the bad parts of Williams. He laid out the investment plan and future vision transparently, and Sainz confirmed it wasn't a fiction.

(Pit Debrief, "James Vowles on Williams F1 progress")

That's six months of consistent communication, honestly sharing the bad, laying out the investment plan and future vision openly. None of that is easy. (Actually, all of it is extremely hard.)

Sainz didn't pick Williams just because he needed a seat. He understood what it would take to turn Williams into a championship team, and he wanted to be at the center of that transformation. Vowles' vision landed.

This is the project of my life. Helping put Williams back where they can win, that's why I'm here.

(Carlos Sainz, after the 2025 Baku Grand Prix (the Azerbaijan race), Sky Sports interview)

This kind of transparency in hiring matters more than people give it credit for. There's always a temptation to show only the good parts, but anyone you hired by hiding the bad parts leaves quickly.

I've done a lot of hiring, and a lot of faces come to mind. There were times I sold only the good parts, and times I wasn't honest enough. As the conclusion above suggests, it didn't end well. State clearly what you want, and be honest about the current state and the goal. That's the attitude.

[Source: Pit Debrief, "James Vowles on Williams F1 progress"]

Action items:

Share the bad parts honestly when hiring. Don't hide current difficulties or unsolved problems.
Lay out the vision clearly. "It's like this now, but our goal is this," concrete and specific.
Spend time building the relationship. The Vowles kind of patience: six months of steady communication.

Hire people smarter than you

Of the ten leadership lessons Vowles shared, this is the one that hit me hardest.

I'm not the smartest person at Williams. I don't need to be. My job is to gather world-class talent, give them authority, and know when to step aside.

That's easy to say and hard to live. Especially if the leader came up as a developer. You hire someone technically excellent, and at some point the thought arrives: "I could probably just do this myself." (It's instinct.) Holding that back and delegating is the leader's job.

Direction is often more important than the answers you give. Indecision is worse than a wrong decision. The answer might not be perfect, but we'll move forward together.

This one I feel painfully. You have to step back fast and check whether you can actually make this call, and if you can, decide quickly. If not, gather the right people fast and make them decide. Indecision is the biggest enemy.

Action items:

Switch the question from "can I do this?" to "who can do this best?"
Don't defer decisions. Decide at 70% information and adjust the rest in flight.
After delegating, don't interfere. Review the result, but don't micromanage the process.

How an engineering team and an F1 team rhyme

While writing this, it struck me that running an F1 team and running a dev team (an IT startup) overlap more than you'd expect.

Coordinating a complex system. An F1 car is tens of thousands of parts meshing together precisely. A dev system is the same.

Data-driven decisions. F1 reads car data, tire wear, weather in real time and builds strategy from it. A dev team (IT startup) makes decisions based on metrics, logs, and user behavior data.

Fast failure and learning. F1 takes feedback every race and improves before the next one. A dev team (IT startup) runs retros and improves every sprint. The "no-blame culture" Vowles talks about is exactly the same concept as Psychological Safety in agile.

What Vowles said at Cambridge Judge Business School (Cambridge's MBA program) lands here.

"The exposure our business gets is obviously very high, and I have to explain results that get a lot of attention every week, but the problems other people in this room face and the problems we face are the same."

(Cambridge Judge Business School)

F1 or IT startup, the problem of leading an org is essentially the same.

[Source: Cambridge Judge Business School, "Leadership in Formula One"]

Closing

The thing that stuck with me most about Vowles' leadership is authenticity.

You have to believe in what you're doing with your whole heart. Ultimately, the thing that pushes you forward every day in hard moments is that belief. And that's when your real character shows. If you're just wearing a mask, eventually it cracks.

Even in the Sainz signing, he didn't show only the good side of Williams. He shared the bad parts openly. That honesty is what built the trust.

I know from experience exactly when I've been honest and authentic and when I haven't. And how each ended. If there's one thing I learned as a startup founder and CTO, it's that what matters in the end is whether you have your own answer to "what kind of team am I building?" And whether you're moving consistently toward that answer.

Putting James Vowles' story in order, here's what I want to apply going forward.

What I learned	Action item
No system, no growth	Find "the Excel in our team" and systemize
Painful change is sometimes needed	Ask: "Is this the only window?"
No-blame culture	Leader shares failures first
1-on-1s are the answer	Lock the schedule, prep feedback in advance
Long-term view	Define and share a clear goal
Honest hiring	Bad parts honest, vision clear
The craft of delegation	Ask "who can do this best?"

Honestly, I can't put into words how happy I am, how good this feels. This is even better than my first podium. We've fought hard all year, and today, when we finally had the speed (and we've had it all year), when everything came together, it proved that we can do amazing things together. Today we executed the race perfectly. Not a single mistake, and we beat a lot of cars we didn't expect to beat yesterday. I'm extremely proud of everyone at Williams for pushing through such a hard year. We've proven to everyone that we've made huge progress versus last year. We're on the rise, on the right path. Unfortunately for me there's been a lot of bad luck, a lot of incidents, and it's been very hard to turn all that pace into results. But today everything came together. The race execution was perfect, the team calls were perfect, the tire management was perfect, the start, every defense and management move was perfect. So we got an unexpected podium. I couldn't be prouder. What other people do isn't my business. What I care about is that the first time a podium opportunity came with Williams, we took it and scored. That's all there is."

(Carlos Sainz Jr., after his first P3 podium at the Baku GP)

Speaking with results — I think that's what real leadership comes down to.

References:

How a 15-Year CTO Vibe Codes

Tony Cho (https://flowkater.io) — Fri, 09 Jan 2026 07:00:00 GMT

Added 2026-02-08

/tdd:go walks through your work step by step. If you want something more automated, see this follow-up:

Give Claude Code Wings: Introducing Superpowers

Opening

Sorry for the bait. I'm not a 15-year CTO. (I have been writing code for 15 years, that part is true.) I'm also not a current CTO. (I left last year.)

The OpenCode (+oh-my-opencode) wave came and went. Lately OpenCode has been on fire over ToS issues and a few other things. As of today it's blocked, and I do have a personal workaround, but the work I have left should be fine on Claude Code alone, so I've decided to stick with Claude Code for a while.

AI coding tools are popping up everywhere lately. Cursor, Windsurf, Copilot, Claude Code. There are countless posts comparing which one is better, and people are constantly sharing their own workflows. Everyone is shouting "vibe coding," and YouTube and Twitter are full of videos where an app pops out of a few prompts. "I finished a month of work in three hours!" type stories show up every day.

But I kept feeling a strange unease in the middle of all this. Productivity was up, sure, but something felt missing. The code worked, but did I actually understand it? Could I really call it "developing" if I was just copy-pasting whatever the AI generated? Questions like that kept circling. And the worst part was that even when I tried to ignore the unease, my flow broke and my productivity dropped anyway. It just stopped being fun, so I stopped wanting to do it.

While I was sitting with this, I came across Kent Beck's writing, and that gave me the spark for my own workflow. What I want to share here isn't the usual technical "which tool to use" or "how to structure your sub-agents" piece. There are plenty of those already. What I want to talk about is tdd-go-loop, a workflow orchestrator. This goes beyond running a single command. It's a system where multiple sub-agents collaborate to automate the TDD cycle and run code reviews along the way.

Kent Beck's Augmented Coding

This workflow was inspired by Kent Beck's Augmented Coding.

There are plenty of Korean translations out there, so search around if you want a read. A piece written six months ago feels almost ancient in AI-coding time, but the workflow I'm using daily right now started here.

The core of Kent Beck's argument is simple. Even when the AI writes the code, the human has to stay in control. Not just throwing prompts and accepting outputs, but slicing the work small, verifying as you go, understanding what you're building. TDD (Test-Driven Development) is the tool for that.

When most developers hear TDD, they think "ah, write the tests first, right?" That's right. But TDD in the AI era is a bit different. In classic TDD, I wrote the tests and I wrote the implementation. In augmented coding, I design the tests, and the AI writes the implementation. That's the key. Designing the test first means clearly defining what I want, and the AI generates code that matches that definition. Test passes, the implementation is correct. Test fails, it's wrong. Simple but powerful.

The important part here is "small unit." You don't throw "build me a sign-up feature" at the AI. You break it down to "write the failure-case test for the email validation logic." You watch the test fail, write the minimum code to pass it, and review it yourself. That's the cycle. Like stacking Lego blocks, you stack small verified pieces.

The concept itself is very simple. Like everyone knows, write a PRD first, then turn that PRD into a plan.md file structured as Phases and checklists. There's a tool called Spec-kit, but in my experience it tends to over-formalize and grow the work, so I built my own planning skill. Since the unit is TDD, you have to slice as small as possible, and the key is making sure work-by-Phase progress stays visible.

Then I use a command called /tdd:go to run things one at a time, by Phase or by sub-checklist, and review each one myself.

It's Not Particularly Vibe-y

If you've followed along this far, you can already tell this isn't very vibe-y.

I run every checklist one by one, check the code, and give feedback if I see something off, something to improve, or something that should be extended. If there's no issue, I move to the next checklist. I'm not the one typing the code, but it's almost the same thing.

It's a kind of pair programming. The AI is at the keyboard, I'm next to it saying "hey, that's not right" and steering the direction. The AI is the driver, I'm the navigator. The one difference from regular pair programming: the AI doesn't get tired, doesn't complain, and fixes things the moment I give feedback. (Though when it makes the same mistake repeatedly, I do get frustrated.)

flowchart LR
    A[Write PRD] --> B[Generate plan.md]
    B --> C["Run tdd:go"]
    C --> D[Write Test]
    D --> E[Confirm Test Fails]
    E --> F[Write Implementation]
    F --> G[Test Passes]
    G --> H{Code Review}
    H -->|Needs Feedback| C
    H -->|Pass| I[Next Checklist]
    I --> C

When I actually work this way, one cycle takes about 5 to 10 minutes. Write a test, confirm the failure, write the minimum code, watch the test pass, review it. Repeat. Vibe? None. But every line of code is something I understand as it goes in.

Honestly, I know this approach isn't "hip." The current trend is to hand everything to the AI and just check the output. The point is finding what works for you.

If you're a junior developer who wants to learn the code more deeply, I strongly recommend this approach. Especially when I have to keep the codebase in my head for a project, or when I'm starting from scratch, I go through it step by step. Copy-pasting AI-generated code and stacking up code I understood while reviewing are completely different experiences. The first leaves you with "the code exists." The second gives you the feeling that "this is my code." The gap is huge.

A Spec Alone Wasn't Enough

At first, I thought a good spec was all I needed. Tidy up design patterns, DDD, clean architecture, and Claude Code (or Codex) would handle the rest. For simple CRUD apps, that actually worked. But reality wasn't that simple.

If you've worked with juniors, you know what I mean. Unless the requirements are genuinely simple, there are always moments while writing the code where architectural judgment shifts. How to split a function. Which layer the adapter and protocol live on, and in what shape. How far to reuse an object or a service, and where to draw the scope boundary.

No matter how good the spec is, you hit "wait, this isn't right" moments while writing the actual code. Can the AI make the right call alone in those moments? In my experience, no. The AI works hard at writing the code that the spec describes, but it doesn't fully grasp the context. "This function is likely to extend later, so let's pull it out as an interface now." "This part is simple today, but the domain will get complex, so let's split it into a separate service." Calls like these still belong to humans.

Migrating an enterprise-scale SaaS off legacy and running it for a year and a half taught me one thing. The moment you take your hands off the code, no matter how much conceptual guidance you wrote, the actual code always hits moments where the writer's judgment is needed. Especially when the initial design collides with messy real-world requirements (which keep changing) and starts deforming bit by bit. Handing that to the AI completely is still too early.

Of course, in the vibe coding era you don't need to follow up on every line of code. Knowing the important parts can be enough. That's a personal call. If you're not a developer or you just want to ship a product without touching code, you don't have to look at the code. I'm rooting for the people who vibe-code without knowing code. There's a company that already raised a Series B with a 3,000-line API file of raw SQL crammed into a single controller. Customer value is the goal, code is the means. I look at code because I'm a developer. That's the only reason.

The Anxiety of Vibe Coding

Anyway, putting that aside, the real reason I settled on this approach is something else.

I was running so far from the actual code that I stopped feeling like I was managing the project properly. Handing it to the AI and resting feels weird, but watching the spinner break my focus while I drift into Threads is probably not just my problem.

You've all had this. The awkward stretch of time where the AI is busily generating code and you don't know what to do with yourself. Doing other work feels off because the result is right around the corner. Just waiting feels wasteful. So you end up on Threads or YouTube, and by the time the result lands, your focus is on the floor. You're supposed to check the output and give feedback, but your head is already somewhere else. Repeat that, and your sense of the whole project slowly fades.

There's a psychological story here too. Humans get anxious when they don't feel in control. Especially in their own area of expertise. A developer not knowing the code is a bit like a driver who isn't holding the wheel. The car might be going fine, but it's unsettling. You can't even be sure it's heading the right place. No matter how good autonomous driving gets, fully letting go of the wheel feels off, and AI coding is the same way.

What an AI agent really does is automate the time-consuming parts of human work. The catch: handing everything over makes debugging harder. The growing sense that "I don't know this code" actually dropped my productivity. The blank feeling when you can't answer "what is this code doing, where?" That's why I eventually settled on keeping a certain amount under my control.

And more than anything, this is about flow. The Flow state Mihaly Csikszentmihalyi wrote about (it's where my handle Flowkater comes from). When something is moderately hard, moderately easy, and the feedback is immediate, humans drop into flow. Vibe coding doesn't meet those conditions. Throw a prompt, wait, check the result, throw another prompt. Flow is hard to find inside that loop. The TDD approach of clearing checklists one by one hits the conditions exactly. The work is small enough that it isn't intimidating, the test result is instant feedback. So it's fun.

But there's a big downside. It's slow.

So I Built My Own Method

I wanted to ease the anxiety I described above without giving up the productivity of the AI era. So I found a middle point. I borrowed Kent Beck's augmented coding concept and customized it to my situation. The result is tdd-go-loop.

This isn't a single command. It's a workflow orchestrator. More than just running /tdd:go repeatedly. It coordinates multiple sub-agents like spec-review, codex-review, sql-review, apply-feedback while managing the entire TDD cycle. Like an orchestra conductor, it defines when each instrument (agent) plays which part.

For the core engine code I need to understand, the API code that holds the actual logic, or when I'm laying out a project structure for the first time, I always go through /tdd:go. Once you do this enough, you notice something. The AI, just like a human, often misreads or guesses wrong about the initial guidelines and writes the wrong code. After you catch those issues and build the foundation yourself, what comes next is mostly developing similar APIs in repetition, or extending logic.

Because Kent Beck's augmented coding enforces TDD as the development style, once you have a code structure in place (whatever architecture it is), the AI builds on that base, which makes whole-codebase review easier too. For example, when I first build out a Usecase in Go, I sketch the outline with mock tests, then in the Usecase layer I implement the logic by composing domain models, functions, and Repository interfaces. If I tell it to define mutations at the top of the function and keep private functions as pure as possible, the generated code becomes highly readable, which makes review comfortable.

APIs that match a pattern and structure I've already built once with tdd:go can move quickly. I only walk through unfamiliar patterns or complex business logic carefully. The rest, I trust.

My Actual Setup

In Claude Code, you define commands and skills as markdown files inside the .claude/ folder. My project structure looks like this:

.claude/
├── commands/                      # Single-execution commands
│   ├── tdd/
│   │   ├── go.md                 # /tdd:go - run one checklist
│   │   ├── batch.md              # /tdd:batch - batch by Tier
│   │   ├── fast.md               # /tdd:fast - full automation
│   │   └── status.md             # /tdd:status - check progress
│   ├── tdd-go-loop.md            # Workflow orchestrator (the core!)
│   ├── spec-review.md            # Spec review
│   ├── codex-review.md           # Code review
│   ├── sql-review.md             # SQL review
│   └── final-test.md             # Final test
│
├── skills/                        # Composite skills (with agents)
│   ├── go-gin-ddd-bun/           # Go project architecture guide
│   │   ├── SKILL.md
│   │   ├── ARCHITECTURE.md
│   │   └── TESTING.md
│   └── api-final-review/         # Final review skill (4 parallel agents)
│       ├── SKILL.md              # 6-stage review workflow definition
│       ├── AGENTS.md             # Detailed setup for the 4 parallel agents
│       └── templates/
│           └── test_script_template.sh
│
├── templates/                     # Document templates
│   ├── plan-template-v2.md       # plan.md template
│   └── api-review-guide.md       # Code review guide
│
└── agents/                        # Sub-agent definitions
    ├── codex-review.md           # Codex review agent
    ├── sql-review.md             # SQL review agent
    └── apply-feedback.md         # Feedback-application agent

Two things matter here:

tdd-go-loop.md: Not a plain command, but a workflow orchestrator. It coordinates multiple sub-agents (codex-review, sql-review, apply-feedback, etc.) and automates the entire TDD cycle.
api-final-review/: A skill that runs the final review after API development is done. Four specialist agents run reviews in parallel.

/tdd:go Command Example

Here's what the actual /tdd:go command looks like:

# TDD: Go (run the next test)

Read plan.md and find the first test marked with `[ ]` (not yet implemented).

## Execution Steps

1. **Identify**: Find the first `[ ]` test in plan.md
2. **Announce**: Tell me which test you're about to implement
3. **Red Phase**:
   - Create or update `*_test.go` file
   - Write a failing test for that specific behavior
   - Run `go test -v ./...` to confirm the test fails
4. **Green Phase**:
   - Write the minimum code to make the test pass
   - Run `go test -v ./...` to confirm ALL tests pass
5. **Update**: Mark the test as `[x]` in plan.md
6. **Report**: Summarize what was done

## Critical Rules

- Write ONLY enough code to pass the current test
- Do NOT implement features for future tests
- Always run `go fmt` on new files
- If tests fail unexpectedly, STOP and report before proceeding

It's simple. Find one checklist marked [ ] in plan.md, run the Red-Green cycle, and check it off as [x] when done. That's all there is.

A Real plan.md Example

A snippet from the plan.md I used to implement the Plan creation API (POST /plans):

# Test Plan: Plan Creation API (POST /plans)

**Dependency**: plan_00_common.md (Domain layer complete)

---

## Phase 1: Application Layer - Input Struct <!-- T1:auto -->

File: `api/internal/application/plan/create.go`

### [x] 1.1 Define CreatePlanInput

- CreatePlanInput struct contains all required fields
- ScheduleInput struct contains Type and Days fields
- CreateItemInput struct contains Name and Quantity fields

### [x] 1.2 Input conversion methods

- ScheduleInput.ToWeeklySchedule() converts valid input to WeeklySchedule
- ScheduleInput.ToWeeklySchedule() returns ErrNoActiveDays when no active days

---

## Phase 5: UseCase Layer - Create Service <!-- T2:deep -->

File: `api/internal/application/plan/create.go`

### [x] 5.1 Service struct

- createService struct depends on Logger, TransactionManager, PlanWriter
- NewCreateService() constructor injects dependencies

### [x] 5.2 Create - validation logic

- Create() creates a Plan from valid input
- Create() returns ErrInvalidTemplate for unsupported TemplateID
- Create() returns ErrInvalidDateRange when StartDate >= EndDate

Each Phase has a Tier marker like  or . This matters. Review depth differs per Tier:

Tier	Meaning	Execution	Review Depth
T1	Scaffold (structure)	Auto	Light
T2	Core (business logic)	Detailed	Deep
T3	Integration (Repository)	Auto	Medium
T4	Surface (Handler/E2E)	Auto	Light

I only do Deep Review on T2 (business logic) and let the rest go through automatically. Reviewing every line at the same depth is inefficient. Focus on the core logic, trust the rest if the tests pass.

tdd-go-loop Workflow Orchestrator

The full flow of tdd-go-loop. Notice that this is not just /tdd:go on repeat. It's a workflow orchestrator coordinating multiple sub-agents:

flowchart TD
    start([Start])
    spec_review["spec-review agent"]
    user_confirm{User Confirm?}
    find_next[Find Next Test]
    has_next{Found Next?}
    tdd_execute["TDD Cycle"]
    update_plan[Mark Complete]
    codex_review["codex-review agent"]
    has_critical{Critical Issues?}
    apply_feedback["apply-feedback agent"]
    commit_ask{Commit Tier?}
    is_t3{T3 Integration?}
    sql_review["sql-review agent"]
    plan_done{All Done?}
    final_test["final-test"]
    finish([End])

    start --> spec_review
    spec_review --> user_confirm
    user_confirm -->|No| spec_review
    user_confirm -->|Yes| find_next
    find_next --> has_next
    has_next -->|Yes| tdd_execute
    has_next -->|No| codex_review
    tdd_execute --> update_plan
    update_plan --> find_next
    codex_review --> has_critical
    has_critical -->|Yes| apply_feedback
    has_critical -->|No| commit_ask
    apply_feedback --> codex_review
    commit_ask --> is_t3
    is_t3 -->|Yes| sql_review
    is_t3 -->|No| plan_done
    sql_review --> plan_done
    plan_done -->|No| find_next
    plan_done -->|Yes| final_test
    final_test --> finish

The key part: every time a Tier ends, the codex-review agent kicks in automatically. If Codex finds a Critical or Major issue, the apply-feedback agent applies the feedback automatically; Minor issues are just logged. If review repeats more than three times, it forces a move to the next step (no infinite loops).

At the T3 (Integration) stage, the sql-review agent runs additionally to catch performance issues like N+1 queries or missing indexes.

api-final-review: Four Parallel Agents

When API development wraps up, the final review workflow runs. Four specialist agents perform reviews in parallel:

┌─────────────────────────────────────────────────────────────┐
│                    api-final-review                          │
└─────────────────────────────────────────────────────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          │                   │                   │
          ▼                   ▼                   ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│ 1. Checklist    │  │ 2. Shell/Fixture│  │ 3. Codex Agent  │
│    Agent        │  │    Agent        │  │                 │
└─────────────────┘  └─────────────────┘  └─────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │ 4. SQL Agent    │
                    └─────────────────┘

Actual setup from AGENTS.md

# Parallel Agent Setup

The four specialist agents used in the API final review.

## 1. Checklist Agent

Verifies the checkbox-completion state of the plan document.

Output format:

- Completion rate: X/Y (Z%)
- Incomplete items: (list, if any)
- Verdict: ✅ Complete / ❌ Incomplete items remain

## 2. Shell/Fixture Agent

Checks that the scripts and fixture files needed for integration tests exist.

Paths checked:
tests/
├── scripts/{feature}/test\_{api_name}.sh
└── fixtures/{feature}/{api_name}/
├── valid_request.json
└── invalid_request.json

## 3. Codex Agent

Tier-based review:
| Tier | Target | Review Depth |
|------|--------|--------------|
| T1 | Domain Entity | Strictest |
| T2 | Application Service | Strict |
| T3 | Infrastructure | Standard |
| T4 | Handler | Standard |

## 4. SQL Agent

Reviews database query performance and optimization.

| Item              | Description                              |
| ----------------- | ---------------------------------------- |
| N+1 query         | Per-row queries inside a loop            |
| Missing index     | Indexes needed for WHERE, JOIN clauses   |
| Over-fetching     | SELECT \*, unnecessary columns           |
| Transaction scope | Right boundaries set                     |

Because these four run in parallel, total time drops a lot. Sequential runs that take 10 minutes finish in 3 minutes when parallel.

Deployment-decision criteria

Critical	Major	Minor	Verdict
0	0	0-2	✅ OK to deploy
0	0	3+	⚠️ Recommend Minor cleanup
0	1+	-	❌ Major fixes required
1+	-	-	❌ Critical fixes mandatory

Code Review Guide

Sometimes the code review itself is the hard part, so I wrote a guideline for the order to read code in and what to focus on.

The T2 (Core) review checklist I actually use:

## T2 (Core) Review Checklist

### UseCase Implementation

| Check                  | Question                                     |
| ---------------------- | -------------------------------------------- |
| Pure Functions         | Are helper functions pure (no side effects)? |
| Explicit Mutations     | Are all mutations visible in main method?    |
| Error Wrapping         | Are errors wrapped with context?             |
| No Business in Helpers | Is business logic in Execute, not helpers?   |

### Pure Function Verification

// GOOD: Pure function (data in, data out)
func buildListOptions(input \*ListPlansInput) plan.ListOptions {
return plan.ListOptions{
Limit: input.Limit,
Status: input.Status,
}
}

// BAD: Impure (modifies input)
func buildListOptions(input *ListPlansInput, opts *plan.ListOptions) {
opts.Limit = input.Limit // mutates!
}

With a guide like this, Claude reviews against the same criteria, and I also know exactly where to focus when I look at the code.

How to Get Started

People keep talking about how they configure Claude Code, how they set up sub-agents, who follows whom, who asks what. Don't let that FOMO push you around. Just install Claude Code right now and ask it directly. What can it do. Then ask it about the things you want to do, and build your own system from there. That process itself is what'll give you survival skills in this fast-moving AI era.

Here's how I'd start:

Make a .claude/ folder. Just one folder at the project root.
Create a simple command in commands/. One markdown file is one command.
Run it as /command-name. Claude Code runs it directly.
Extend as needed. Agents, skills, templates — grow it from there.

You can also just ask Claude Code to do all of the above for you.

Most of the workflows, commands, and agents I've built were made by asking Claude Code while I used it. Don't try to build a perfect system from day one. Start small and add as you need. Even this blog post went through a multi-agent workflow that takes the rough draft I dumped out and polishes it.

Closing

One thing I learned during my CTO years. Delegate but own quality. Hold the whole picture but trust the details. I couldn't read every line of code, so I focused on the important parts and trusted teammates with the rest. The same thing applies when working with the AI now. Treat the AI like a capable junior developer. Give clear instructions, review the output, give feedback when needed, hand off the next task when it does well.

Looking back, this is a leadership skill. Organize complex requirements before passing them on, focus on the key decisions, delegate the execution. In the AI era, I think we're all going to work this way. Whether I type the code myself or hand it to the AI, in the end I'm the one who defines what to build and owns the quality. Picking up that sense might be the most important skill right now.

If I'm being honest, I could only run experiments like this because I left the company last year and started a personal project. At work, between meetings, planning sync, and reviewing teammates' code, I had no room to dig into this kind of thing. Working alone, with no one to second-guess, I could try things freely. OpenCode being blocked is a small bummer, but I'm more excited than worried about how AI evolves through 2026.

What matters in the end is finding your own approach. Vibe coding, augmented coding, whatever style. As long as you don't lose the joy along the way. If chasing productivity costs you the fun of coding, that's the real loss. As long as making things with code still feels good to me, I'll adapt to whatever tool comes next.

Gophercon Korea 2025: A Review

Tony Cho (https://flowkater.io) — Thu, 11 Dec 2025 07:27:07 GMT

https://gophercon.kr/

I figured if I waited until after this year ended, I'd never write this up at all, so here's a quick review of Gophercon Korea, which I went to last month on the 9th at Magok COEX.

Going in

Back in my SW Maestro (let's call it SoMa) days, and during the early years of running my own startup, there were a lot of events like Naver DEVIEW (Naver's annual dev conference), and I went to plenty of them. Once you started showing up, it was kind of like a class reunion. You'd run into everyone you knew, exchange greetings, and drift between sessions together. During SoMa I think it was JCO (a Java developer conference) where the whole team went and even gave a talk. That was already ten years ago in calendar terms. After that, while running the company, I cared more about business and operations than about hands-on engineering, and even though I kept coding, I more or less lost track of dev conferences. After COVID I forgot they even existed.

At my last company, every now and then a teammate would ask if they could go to one of these on a weekday, and I'd approve and send them. Once COVID hit, most of these talks went straight to YouTube (either live or right after), so my pattern became: skip the event, then look up a topic I cared about whenever I felt like it.

After I left the company, I started writing the occasional thread on Threads. Someone read my recent backend post and a thread of mine on running test code, then reached out by DM: "Are you going to Gophercon this week? If you are, I'd love to grab a quick coffee chat." Truth is, I knew Gophercon existed but had no idea it was happening that very week (November 9), and the idea of attending myself hadn't even crossed my mind, so I was caught off guard. But I also wanted to talk shop with engineers and get back to events, and give George (who got into Go because I'd put him on a Go backend) a taste of the gopher world (?). So I bought tickets right there: a sponsor ticket for me, a regular ticket for George. To the backend engineer who DM'd me I replied, "Of course I'm going. Want to meet up briefly that day?" Maybe I just didn't want to admit on Threads, where I cosplay as a Go engineer of sorts, that I had no clue. (Of course I confessed all of this when I actually met him on the day. Told him: thanks to you, I ended up going.)

The talks

I caught almost everything except for one or two slots I skipped for coffee chats, but I'll only write up the talks I actually understood well enough to comment on.

Building a real-time interpreter in Go (real-time AI inference, WebRTC)
- I forget the company name, but it was an edtech outfit, and given the domain I paid close attention. The premise was that the most important thing in education is the "classroom" as a space, and the talk focused on how to recreate that classroom experience online. They walked through how they used Go's concurrency to build real-time behavior, then expanded that into a global classroom, eventually layering in real-time translation with LiveKit. The step-by-step way they tackled each challenge was genuinely impressive.
- This is one I'd like to rewatch when the recording goes up on YouTube.
Effect-ive Go: Truly Go-flavored functional programming
- One of the biggest things I let go of when I moved to Go was functional programming. I still pull in samber/lo occasionally, but the core of FP (immutability, monads, tight abstractions) isn't easy to bring into Go straight, and twisting the language that hard doesn't really feel "Go-like" either. There was a stretch where I went deep into FP, but these days I think it's better to find the middle path that fits the language and the problem. So I was curious how the Effect pattern implements "Go-style functional" in Go.
- To boil down the Effect pattern from this talk in one line: in your domain and business logic, you only declare side effects as "effect messages," and the actual side-effect execution (DB calls, external APIs, logging, concurrency) gets delegated to separate effect handlers. The point is to maximize purity, testability, and reusability through a Go-style Effect Handler architecture.
- I went through the repo, typed out the code samples, and tried mapping it onto my own codebase (a fairly large backend on DDD / clean architecture). At least for the use cases I have right now, the existing layer structure already gives me domain purity and infra separation, so adding a separate effect layer would inflate conceptual and implementation complexity more than it'd give back. It's closer to a "more cost than benefit" pattern in my situation.
- That said, the spot this library is aiming at is real. Go often gets used in multi-module / multi-runtime setups in production (HTTP, gRPC, batch, workers, etc.), where the same domain logic has to run across different infrastructure combinations, or you need to "interpret" the same domain logic differently across handler combinations for testing, simulation, or replay (think CQRS, event sourcing, heavily functional architectures). In those contexts, modeling logging, transactions, retries, circuit breakers, and concurrency control as explicit effects, and controlling when and how they get composed at the handler level, becomes a real advantage. That's where experimenting with something like effect_ive_go is worth the investment.
Test Reality Not Mocks: Reliable Go Tests in the AI Era
- This lined up exactly with my own post arguing against DB mocking. It pushed further by showing real handler-based tests, and broadened the scope to TDD and how to write better tests in Go projects in general. When I wrote that post I said I don't do TDD, but I'm now actually doing TDD-style coding alongside Claude Code, so this talk genuinely helped me out.
Dev in Go way (the Go-ness of Go)
- The most quintessentially Gophercon talk of the day. Grounded in the language itself, it walked through why patterns common to Java or other OOP languages (builder pattern and friends) don't really fit Go, and what makes good code "good code" by Go's spirit, by Go idiomatic standards. The speaker came across as a real veteran, but the talk itself was the easiest to follow, and I think every attendee got something out of it.
- Small Interface is something I'm already trying out in my current backend. My repository interfaces in particular have been swelling up to the point where the code is hard to follow. Splitting both the interfaces and the files apart turned out to make the code easier to read and also reduced collisions when working on it concurrently with Claude Code, more than I expected.
- Accept interfaces, return structs is basically unavoidable once you build clean architecture in Go. And the recommendation to use the Options pattern instead of the builder pattern (the builder pattern doesn't fit Go cleanly because of multi-return errors) looks promising for my domain structs, especially when there are lots of constructor option values, so I'm trying that out too.

Overall

About 80% of my coding over the last six months has been Go, so directly or indirectly, all of these talks were useful. The content itself was solid. I'd like to give a talk myself sometime, from the perspective of running a service in production. Funny enough, the reason I started Go five years ago was this video: Daangn's engineering team adopts Go, on Daangn Tech. The speaker, Byun Gyu-hyun, seems to still be at Daangn, and gave talks at Gophercon last year and the year before. He didn't speak this year, which left me a little disappointed.

Networking

The two main groups I ended up talking to: the engineer who DM'd me first (let's call him A) and another person who'd spotted my posts on Threads (B). A is a backend engineer who came with his Canadian CEO (C). They're building humanlog.io, an observability service. We talked Go, and also observability experience (in my case, a lot of Datadog), Clickhouse, and so on. C joked that since I knew Go, had observability experience, and had used Clickhouse, maybe I should join them. I laughed and played along. (Pretended I didn't catch the English.) The interesting thing is, A and C also met at last year's Gophercon, so it was clear that, unlike A, C had come with the explicit goal of scouting talent.

B came with a Go engineer (D) he works with full time. They run a business serving small business owners and the self-employed (Korea's SMB segment), collecting a massive amount of data using Go. Because D is the Go person, Go ended up being the main language of the business, and now that they want to seriously scale up they're trying to hire more Go engineers, and since there are barely any around, they were attending the conference partly hoping to meet people. There was something about them that reminded me of the energy I had when I was first starting my own company. A good team. I told them honestly that finding a Go engineer right away would be hard, and that they might be better off hiring a junior who can hustle and training them up. But I left wanting them to win.

What conferences are actually for = networking

If you only count the talks, honestly, conferences like this don't pencil out on a time-spent basis. You sit through a fixed track on a fixed schedule, and even when tracks are split by backend, frontend, data, the range inside each track is too wide. When the track is bound by a single language like Golang, the gap gets even wider. This year's lineup ranged from vibe coding to blockchain. There were topics every Go user could relate to, and topics that, at least for me right now, I couldn't follow or didn't care about. That said, the parts about the language itself and goroutine concurrency are the kind of thing everyone could take something home from.

But once you fold in networking, it's a different story. I skipped about two talks (one in the middle, one near the end) and used that time to talk with people I happened to connect with at the venue, and the common thread among them was hiring. Or some version of "this is how I'm doing Go, how are you doing it?"

When you give a talk, the best you can do per unit of time is meet one speaker. But if a talk is something you can already watch online, what brings people out in person is, I think, the networking. If conferences set aside more space and time for people to find others who share their goals or want to talk about the same things, you'd come away with something more meaningful than just sitting through sessions.

I got lucky and ended up meeting good engineers by chance. If there had been more structured opportunities, I would've liked to know and talk with a wider mix of people. The "opportunities" we hope for tend to come out of small talk, coffee chats, the weak ties of networking. They accumulate slowly and then suddenly turn into something. That's been one of the bigger lessons of the past few years for me.

Gophercon in particular is a relatively small group in Korea, but it occupies a unique position (backed by sizable companies like Devcat, Daangn, and Line), so if this kind of networking grew more active, the contribution to the broader Korean Go ecosystem would expand much further.

Photos

My nametag from the venue. Google, Daangn, and DEVSISTERS were among the sponsors.

An eco-bag with a gopher in hanbok holding a taegeuk fan. A uniquely Korean Gophercon touch.

The swag that came with the sponsor ticket: a hand-knit gopher doll, stickers, keychains. A pretty generous lineup.

Heading out

It was a good outing back into a dev event after a long time, and George said he picked up a lot too, so it turned into a good day. Somewhere along the way I'd grown skeptical about events like this (the worry that you're just chasing trends, or going because everyone else is going), but listening to talks and talking with people turned out to be a worthwhile experience.

I went to show George the gopher world, and ended up bringing something home myself. Just like five years ago when Byun Gyu-hyun's talk got me into Go, next time I'd like to be the one giving a talk at Gophercon.

I'll leave it at that — hoping the Korean Go ecosystem keeps getting livelier.

What Should Junior Developers Learn in the AI Era?

Tony Cho (https://flowkater.io) — Sat, 06 Dec 2025 11:40:16 GMT

The Junior's Dilemma in the AI Era

I left my company and have been coding alone for more than six months. The past six months were the fastest stretch of change I've ever lived through. Coding, especially, took the full impact of that change. If I had stayed at the company, I probably wouldn't feel it as sharply yet. Building products with my own hands, hitting wall after wall, I came to feel the speed of AI's progress in development exactly as it is.

People around me include mid-level engineers, but a fair number of juniors too. I used to mentor some of them, I still mentor some now, and once in a while one of them brings a resume and I give feedback. Their worries became mine. I kept turning the same questions over. In this brutal market, what should they do to land a job? In this AI era, can they actually get hired the old way, by prepping a portfolio, doing a bootcamp project, and blogging? How are they supposed to grow at all?

The Cold Reality of the Hiring Market

Why Junior Hiring Shrunk

In a stretch where tech moves this fast, the market froze just as fast and hiring collapsed. Junior hiring, especially, has all but dried up. Unlike three or four years ago, the IT industry has lived through layoffs, voluntary retirements, bankruptcies, failed pivots, and the companies that survived learned that if you don't keep your wits about you, you go under. Even when the market loosens up and money starts flowing again, that lesson means they won't go on a hiring spree the way they did three or four years ago.

The claim that AI is killing developers is, for now, a false statement. We can't push back against the macroeconomic tide, and the job crunch is a global issue, not just a Korean one. Companies have tightened their belts, and on top of that, they barely pay anything for hires they have to grow and teach, which means people who can't be put to work right away. You design a future when you can see tomorrow. When you're just trying to survive today, all of that feels like waste. But cause and effect are real here: a global economic crunch is producing a job crunch, and AI is accelerating it.

Who Companies Actually Want

The roles posted most heavily right now are mid-level, roughly four to six years of experience. I often see people who slipped into the so-called developer boom three or four years ago hopping jobs relatively easily, even in this rough market. But people who are just starting to study development or who just took their first job, juniors and entry-level engineers, either can't get hired and switch jobs at all, or they luck into a position and get no real chance to learn.

When companies hire heavily at the mid-level, they're hiring people who can plug into the current project on day one. As a junior, you have basically zero experience building or running a real production project. So if a junior wants to be competitive in today's hiring market, they need experience collaborating on a team project, deploying it to an environment with actual users, and operating it. It's not easy for one developer to build something, ship it, and run it while also taking care of marketing and operations. It's hard. And yet that's the kind of person companies are hiring right now.

AI Is a Double-Edged Sword for Juniors

AI is a double-edged sword for juniors. It's a tired metaphor, but it's the kind of sword where the more you swing it, the more you cut yourself. Some people say that with AI you can ask anything, study anything, do anything easily, and that traditional education is over and anyone can do anything. In the development areas I know "well," I can study that way too, and I do. I can't even remember the last time I went to the developer book section of a bookstore. I used to drop by Kyobo Mungo and check what was new every time I was nearby. These days I almost never go.

How Do You Ask When You Don't Know What You Don't Know?

But if I want to study a brand new field, can I just lean on AI right away?

Recently I wanted to study Japanese, and I tried using AI to look up sentences and various things, and I ended up buying a textbook and a course. A complete beginner doesn't even know what to ask. To borrow a machine-learning analogy, without some pre-training on a dataset, you can't do anything.

So when I mentor juniors, I tell them to use AI agents like Claude Code as little as possible. Some people will hear that and call it obsolete advice. Fair enough. But when you don't even know what you don't know and you hand most of your work to an AI coding agent, you lose every chance to learn and grow. Thanks to AI, everyone can build something. Because of AI, no one is learning. That's the era we're in.

The Difference Between an AI Agent and Searching or Asking

The advice here isn't aimed at non-developers building products through vibe coding (founders, basically). This is for people who want to grow into professional developers, and it isn't only true for development. In any field where you want real expertise, handing everything to AI makes it hard to find the chance to grow properly. Searching and asking when you don't know something, using AI for that, I recommend it. (What I'm saying not to use is the AI coding agent that does it all for you.)

Even back when AI didn't exist, copying and pasting working code from Stack Overflow without understanding the cause was constant. The difference is whether I'm asking the AI directly about my own problem or not. And it matters more than you'd expect where the agency sits. If it sits with me, the learning from solving the problem stays with me. If it sits with the AI, that learning gets thrown out with the AI.

So How Should You Use AI?

Let me get specific. The standard a junior should hold to when using AI is this: did I try first? When an error happens, read the message first. Then form a hypothesis about why it's happening. Then ask the AI. Not "how do I fix this error?" but "I think this error is happening because of A, is that right?" The difference is huge. Same when you write code. Type it yourself first. When you get stuck, figure out exactly where you got stuck and ask only about that part. Not "build me a login feature," but "the session isn't holding, is the logic in this part wrong?"

To sum it up:

Do this:

Read the error message, form a hypothesis, then ask AI to verify
Ask for feedback on code you wrote yourself
Ask for explanations when a concept confuses you
After you finish solving a problem, ask about other approaches

Avoid this:

Throwing requirements at AI and asking for the whole thing
Pasting an error without even reading it
Only asking "why isn't this working?"
Copying AI's code without understanding it

In the end, the agency has to sit with you. AI is a tool that helps you learn, not a thing that learns in your place.

What Juniors Should Be Building Now

I'm about to talk about literacy, real projects, and coding tests, and I can already hear the question, "so what should I do first?" Honestly, it depends on your situation. If you push me to rank them, here's how:

If you have an interview lined up: Coding tests and tidying up your projects come first. Literacy doesn't go up overnight.
If you have three months or more: Read and write in parallel while starting a real project. One coding test problem a day, steadily.
If you're prepping over six months or more: Build literacy first. It ends up setting the speed of every other kind of learning.

The key is that all three of these aren't "things to do," they have to become "habits." This isn't something you flash through during job-prep season and drop. Reading books, running projects, training problem-solving, those are things you keep doing the entire time you work as a developer. Starting now is fast.

Literacy: Reading and Writing

Looking back at the mentoring I've done recently, the biggest issue is literacy. In this AI era, the ability to read and write may have become more important than code. AI models these days dump enormous answers in response to a single question, and being able to actually read it, understand it, digest it, and apply the right solution to your case, that's literacy too. Asking AI a good question is the same. Beyond communicating with AI, once you join an organization you'll be communicating with people, and grasping the core of what someone is saying, digesting it, organizing your opinion, and speaking it, those are all abilities that grow out of literacy.

When someone preparing to land a developer job asks me what they should prepare, I tell them: read more books than you do projects or algorithms. Then take what you read, organize your thoughts about it, talk about it with other people, and write about it. That's the fastest way to raise your literacy in the short run. In an era where every piece of content is short-form, the one edge you can take over everyone else is reading and writing. With that as a foundation, whether you're aiming at being a developer or studying any other field, you'll grow much faster.

People who read a lot don't automatically succeed. But every successful founder I've watched was a heavy reader. The people who can step back and check their own thinking, read the current situation, learn, and pull action items quickly, they were all heavy readers too. Sometimes when I'd talk with them about books I'd read before, almost every one of them had read the same ones. The people whose growth stalled and who kept making the same mistakes were mostly the ones who had stopped reading.

When I tell people to read more, the next question is "what should I read?" If you really feel cut off from books, read anything. A shallow self-help book is fine. Whether it's shallow is for you to decide after you read it. That said, if you want to raise your literacy quickly, read books you can write about. Read a novel and write your impressions, read a business book and organize your thoughts, read books that let you share your thinking with others. That's what accelerates your growth. And if you read two or three books for self-improvement, it's worth squeezing in one novel that someone recommends to you. Novels help you step back and check your own thinking, and they grow your empathy without you noticing.

Project Experience That's Close to the Real Thing

No matter how deep your specialized knowledge gets, studying alone has limits. The throwaway one or two-month projects from a bootcamp or an academy don't move the needle for companies. People who face the reality of what the market demands today, think hard about what counts as a project close to the real thing, and pile up that kind of experience are the ones who'll stand out. Developers can at least lean on AI; in other fields, that kind of pre-employment experience is even rarer, so paradoxically, you should value it more.

Close to the real thing means you deployed even a small feature yourself, recruited users, took feedback from those users, and improved the feature. Most junior portfolios end at toy projects, so just having shipped to production and improved something already gives you a real edge, and on top of that, fixing bugs and working through user feedback is something to keep building on. Whatever stage you're at, getting hired in this market takes patience, and instead of burning that time only on studying, scout for projects where you can build real-world experience, plan them, run them. Building something that even ten people use versus building a throwaway project will give you completely different growth opportunities.

Coding Tests: Training Problem-Solving, Not Memorization

Junior developers ask me a lot whether it still makes sense, in an era when AI does it all, to study coding tests. And in the actual workplace, people say algorithm coding tests were useless to begin with and now even more so.

There's a part of this that's true and a part that isn't. If you ask whether algorithms are useful in themselves, well, while building APIs or implementing UI, we almost never use dynamic programming. (For typical SaaS or B2C services.) Reading lots of books, as I said above, has more direct relevance. (Meetings, documentation.) That said, if you treat dynamic programming as domain knowledge, the practice of figuring out how to solve the problem in front of you is, I think, an essential learning process for a developer.

In other words, prepping for coding tests by memorizing data structures and algorithm theory and solutions is no help at all. We can run into new customers and new requirements at any company, in any domain, in any project, so what we need isn't algorithm theory as explicit knowledge but the tacit knowledge you pick up in the process of solving the problem. A problem is given, you organize how to solve it, you turn it into code. That's why I think Big Tech still values algorithm interviews. It's hard, in a portfolio project, to analyze and implement ten different customer requirements, but with algorithm problems, every problem lets you practice the problem-solving process in a fresh setting with fresh information. As long as you bring deliberate practice to it, algorithm coding tests can be a serious help. And if, after you finish your own solution, you ask AI for feedback to expand your knowledge and experience from different angles, even better.

Closing: Why People Are Still Needed

Someday, all our jobs may be replaced by AI. But for now, a person still has to drive. The market shrunk and hiring shrunk with it; that doesn't mean hiring is gone. Even with tech moving this fast, companies are still hiring and still need people. When you've run an organization, you see that more often than the code itself, people are the opportunity and people are the problem. Other fields are probably the same.

If I were the hiring lead or the owner, who would I want to work with? Compared to that bar, what am I missing? Think it through and the things to prepare are clearer than you'd guess. A company invests 10 in its people and tries to get back 11, 20, or 100. Even after you get hired, you have to keep asking yourself whether you're someone who can contribute to the whole organization's growth.

The market is rough. At times like this, you walk your own path: do your utmost and leave the rest to heaven — what Koreans call jinin-sadaecheonmyeong (盡人事待天命). Instead of dropping a line about how AI is doing whatever, instead of staring at endless shorts, read a book and put your thoughts in order. Even if you're not good at it, please keep at it. And don't try to study; run projects the way you'd run work. And don't try to memorize; practice coding tests the way you'd train a process.

You can't fight the times. The individual human is fragile. People are easily swept up by their environment. When good results come, it's usually because the environment cooperated. It feels like you got hired because you were good, but it might have been a good market (or good timing), and your results inside an organization usually rest on the organization's systems. Anyone who's done real work will agree. So you shouldn't get smug when things go well, and you don't need to spiral when things go badly. You can't beat the times, but you can beat today. Look honestly at yourself and you already know what to do in this moment.

I roughly organized the thoughts that came up while mentoring junior developers. I hope this lands for someone trying to make it through.

Scrumble Tech Retro - 2. The Frontend, with a Side of Vibe Coding

Tony Cho (https://flowkater.io) — Fri, 03 Oct 2025 12:24:55 GMT

What the Scrumble frontend had to do

It's an SNS at the core, so people post, react with emoji, drop comments, and attach images to content. All of that is non-negotiable.
After user auth, members re-authenticate inside a workspace, so the whole product runs on member-level identity rather than just user-level identity.
As I covered on the backend side, realtime mattered everywhere. Notifications, emoji, comments all needed to update live.
The to-do list had to feel as smooth as we could make it, which meant mapping keyboard shortcuts.

Tech stack

Framework: Next.js 15.1.8 (App Router) + React 19, dev server on Turbopack
Language: TypeScript 5
Styling: Tailwind CSS 3.4, tailwind-merge, class-variance-authority, Pretendard font, PostCSS/Autoprefixer
State and data: Zustand 5 for client state, TanStack Query 5 + Axios 1.9 for server communication
Realtime: centrifuge client talking to Centrifugo (auto-reconnect, channel persistence)
Files and media: Cloudflare R2 presigned upload, HEIC-to-web-format conversion utilities (heic2any, libheif-js)
UI/UX: Framer Motion 12, Emoji Mart, Lucide and Remix icon sets, React Day Picker, React Window virtualization
PWA: next-pwa 5.6.0 service worker plus manifest

Architecture

Overview

The Next.js 15 App Router owns routing, layouts, and the boundary between server and client components. Global providers all sit in one file, src/app/providers.tsx.
Page components only handle rendering and interaction. Reads, writes, and realtime sync are pulled out into dedicated hooks so the page itself stays thin.
Product features live inside src/features/* as a vertical slice (UI, hooks, services, and a local Zustand store all bundled together by domain).
Anything shared (UI, contexts, services, utils) goes into src/shared, so each domain module can stay focused on its core logic.

Application shell (`src/app`)

The App Router directory (page.tsx, layout.tsx, loading.tsx, etc.) defines route surface, lazy loading, and metadata.
providers.tsx wraps QueryClientProvider together with auth, timezone, and the global loading context. The point is to make sure every global hook reads from the same query client.
Route groups like spaces, auth, and the dynamic segment [spaceSlug] map one-to-one with feature domains. Leaf pages usually just call into an entry point under src/features/**/pages and delegate the logic.
API routes under src/app/api/* only act as a server-side bridge to the backend when we actually need one.

Feature modules (`src/features/*`)

Each feature follows a vertical slice layout: components, pages, hooks, services, stores, types, utils, data all in one folder. Domain UI and logic stay in one place.
A feature page is just a thin wrapper that composes shared layouts with domain components.
Hooks wrap the read/write logic and side effects, pulling in TanStack Query helpers from src/shared/hooks/queries and the domain services.
Local Zustand stores (stores) only hold transient UI state that shouldn't leak outside the feature.
Feature services use the shared API client to implement domain-specific formatting, optimistic updates, and derived models.

Shared layer (`src/shared`)

components: design system components built on Tailwind + class-variance-authority (feedback, navigation, inputs, and so on).
hooks: reusable query hooks (queries/*), auth helpers, and realtime subscription hooks. These hide data fetching and state orchestration.
services: infrastructure services like centrifugo.service.ts, plus file upload helpers and an analytics logger.
contexts: the auth, timezone, and global loading contexts that the app shell consumes.
stores: global Zustand stores for auth state, toasts, time utilities, and so on.
lib: low-level integrations (Axios API clients per resource, the token manager, fonts, and API helpers).
types and schemas: type definitions and validation schema fragments shared across feature and API layers.
utils: formatting, error handling, and general-purpose helpers.

Data and state flow

Axios clients live in src/shared/lib/api/*.ts, which centralizes base URLs, interceptors, and the token refresh logic.
Query and mutation hooks under src/shared/hooks/queries standardize TanStack Query keys and caching policies, so feature modules can compose them safely.
Global state goes into the lightweight Zustand stores under shared/stores. Feature-specific state stays in a local store inside that feature folder.
Forms mostly use React Hook Form + Zod, with a shared resolver utility on hand when we need it.

Realtime sync

CentrifugoService keeps a single Centrifuge client alive and handles reconnects, channel persistence, and event dispatch.
Realtime hooks don't open new sockets. They extend the central Centrifugo layer, so every feature reuses the same connection and event bus.
When an event arrives, the hook patches the TanStack Query cache or the local store directly, so the UI updates without an extra refetch.

UI composition and styling

Tailwind CSS + PostCSS is the styling base. We compose variant classes safely with class-variance-authority and tailwind-merge.
Animations and micro-interactions go through Framer Motion. Icons and emoji come from shared libraries like Lucide, Remix Icons, and Emoji Mart.
Responsive behavior and virtualized scrolling (React Window) are factored out into reusable components, so page code stays declarative.

Testing and developer experience

We test hooks and components with Jest + React Testing Library, and run E2E regression scenarios with Playwright.
TypeScript strict mode, ESLint, and npm run build (Next.js with SWC/Turbopack) keep types, lint, and build stability honest before commits.
Shared tooling like shared/utils/debug and the token manager keeps debugging and storage strategies consistent across environments.

High-level module interaction


graph TD

subgraph "App Router (src/app)"

app_pages["Pages & Layouts"]

app_providers["Global Providers"]

end



subgraph "Feature Modules (src/features/*)"

feature_pages["Feature Pages"]

feature_components["Feature Components"]

feature_hooks["Feature Hooks"]

feature_services["Feature Services"]

feature_stores["Local Zustand Stores"]

end



subgraph "Shared Layer (src/shared)"

shared_components["Shared UI Components"]

shared_hooks["Shared Hooks"]

shared_contexts["Shared Contexts"]

shared_services["Infrastructure Services"]

shared_stores["Shared Stores"]

shared_lib["API & Token Library"]

end



subgraph "Infrastructure"

tanstack["TanStack Query Client"]

axios_client["Axios Resource Clients"]

centrifugo_service["Centrifugo Service"]

backend[("REST API Backend")]

centrifugo[("Centrifugo Hub")]

end



app_pages --> feature_pages

app_providers --> tanstack

app_providers --> shared_contexts



feature_pages --> feature_components

feature_pages --> feature_hooks



feature_components --> shared_components

feature_hooks --> shared_hooks

feature_hooks --> feature_services

feature_hooks --> feature_stores

feature_services --> shared_services



shared_hooks --> tanstack

shared_services --> axios_client

shared_services --> centrifugo_service

shared_stores --> feature_components



tanstack --> axios_client

axios_client --> backend

centrifugo_service --> centrifugo

shared_hooks --> centrifugo_service

feature_hooks --> tanstack

feature_hooks --> centrifugo_service

Implementation retro

Styling

We mostly used Tailwind CSS, and I leaned on Claude Code so heavily that I barely wrote styling by hand. Most vibe coding videos out there don't actually start from a designer's spec. They pull tips from sites the author wants to benchmark. From where I sit (working with an actual product designer, Ellie), those tips just weren't useful.

CSS Vibe

Around April and May, MCP exploded into a trend. Figma MCP especially got hyped as if hooking it up to Cursor or Claude Code would auto-implement everything. In practice, that's not how it goes. AI does have some image recognition, but accuracy drops fast. Even when Figma MCP pulls data, it's reading the properties of Figma image objects under the hood, and because the LLM is text-based, it can't get to a 100% implementation. It flounders a lot.

You could ask the designer to make the Figma file pristine (naming, layout, all of it). But realistically, having the designer hand-craft every detail is less efficient than me grabbing the rough style values one at a time and pasting them into a prompt. So MCP got a few uses early on and then I dropped it. I'll get into this more when I write about how I code with AI, but as of now I don't use Playwright MCP, Context7, or any of those.

People talk about MCP a lot less these days too. "It connects" and "it actually works" are different things. Plenty of early write-ups about "it connects," but I've seen almost nothing about MCP actually working well at production scale or in a real team setting (as opposed to toy projects, MVPs, and prototypes). Could be I'm just ignorant about it. But I decided that the time I'd spend researching MCP was better spent being a little more diligent myself.

Back to the implementation. What I ended up doing was grabbing the style values straight out of Figma Dev Mode as text, and implementing each component that way. The first pass is a bit of a slog, but it gets you to roughly 90%, and you only need a light touch after that. With MCP I'd hit 30% and then wrestle with prompts forever, so this is the workflow that keeps me moving right now.

A side note. I tried doing a few screens with no design at all, just whatever Claude proposed visually, but Claude's own design taste is rough, and I'd have to put serious effort into the prompt without any clear sense of what "done" looked like. I gave up. My setup assumes I have a strong product designer as a partner. For people building solo or without design resources, MCP and benchmark-driven vibe coding will probably still be valid. Just always carry a clear definition of "done" with you.

State management

We use Zustand for some global UI state (saving the last selected date, that kind of thing), but most of state lives in TanStack Query (a.k.a. react-query).

React-query

React-query gives you caching plus a bunch of UI-state primitives, and the early learning curve is real. The painful case is when you layer auth middleware on top of plain API calls. You need automatic refresh logic when the access token expires, something I've been doing for over ten years. Mixing that into the react-query and Axios layer caused confusion. The actual cause was Claude Code generating duplicate code that I missed. Early on, expired access tokens triggered an infinite redirect loop, and I had to go back and review every line of the react-query code Claude had written, one by one, to fix it. If I'd just read the code, it would have been a five-minute fix. I tried to one-click my way out of reading the react-query code and burned over an hour instead.

That was the first time I felt how brittle the software gets when you go pure vibe coding outside of MVP territory, and I felt it a lot more after that. If you can read the code, read it. It's faster. (For now.)

React-query manages its own cache. I had similar experience with caching in gql-apollo, but react-query's surface is broader and more feature-rich. Once the server adds caching too (Redis), things get genuinely tricky. Right now I'm doing full-stack so it's fine (I know every policy in my head), but in a setup where backend and frontend are split, applying caching with the wrong policy can produce bug-shaped behavior that isn't really a bug. The backend has an interface layer in its architecture, and the API is shaped around client use cases, so this kind of thing has to be designed carefully. Caching done right cuts UX latency and load. Done wrong, it shows the user values they didn't expect.

Optimistic UI

As I mentioned on the backend side, the early infra region delay made API latency painfully long. (The feed screen took 2+ seconds.) Without changing the infra (we wanted to squeeze what we had first), the first thing I reached for was Optimistic UI.

The idea behind Optimistic UI is straightforward. You patch react-query's internal state first so the UI reflects the change immediately and the user sees no delay. The actual API request runs in the background, and if the response comes back with a different state or an error, you roll the UI state back.

The first version had small UI glitches. The most obvious one: the UI updated, then the response came back and re-updated, causing a flicker. That's because we were re-applying the server response to the UI. It guaranteed the freshest server state, but the flicker hurt UX. I changed it so that when the response comes back without issues, we skip the redundant update.

Realtime

When someone reacts with an emoji to a check-in or check-out, or drops a comment, it updates in realtime. Anyone in the same space should see those feedback actions live. We had Go and Centrifugo (Redis) wiring up the WebSocket nicely, but realtime updates kept getting dropped early on, and I struggled with it. I first suspected a server-side WebSocket issue, but if Centrifugo were the problem, the ClickUp realtime trigger we had wired up next to it would have broken too. That one worked perfectly, which pointed at the client.

To make the WebSocket channels efficient and spread the load, the feed has multiple posts and we subscribed to each one by post ID as its own channel. As you scroll and a post leaves or re-enters the viewport, the handler resubscribes to that post's channel.

I thought this was efficient because we weren't holding subscriptions to every post ID at all times. The catch: deciding handler connection state from the UI meant connections occasionally got lost, so comments wouldn't arrive in realtime now and then. I spent a fair amount of time fixing it, but honestly, just keying the channel on space + date and subscribing to the whole feed would have been simpler. If the event volume or feed size were huge, sure, you'd want optimization. But this was a case of trying to be too clever upfront and burning time on trial-and-error.

File storage

One thing I like about Next.js is that you get your own server. We didn't implement file storage in the Go server. We did it directly in Next.js, and only sent the file metadata and the file address to the backend.

R2

As part of going off AWS, we used R2 for storage. R2 is basically Cloudflare's S3. It's just as simple to use as S3. Connect the address, register the auth keys as a Vercel or local secret, and you can upload right away. If you need file storage for something you're building, I'd recommend R2. Below is a comparison of the current Free Tiers for S3 and R2. The most attractive piece is the unlimited egress. If your images get embedded across a lot of websites, R2 is worth a serious look on top of just upload cost.

Item	AWS S3 Free Tier (first 12 months)	Cloudflare R2 Free Tier (permanent)
Storage	5 GB	10 GB-month
Class A ops	2,000 PUT/COPY/POST/LIST requests	1 million requests
Class B ops	20,000 GET/SELECT requests	10 million requests
Egress	100 GB	Free (unlimited)
Other	$100 credit on new accounts (30+ services)	Permanently free, per account

The bigger lift wasn't the upload itself. It was the drag-and-drop UI for dropping files in directly, plus the upload progress indicator. We also did a small client-side resize instead of always uploading the original. The feature came out of how I'd use it personally, occasionally attaching a photo to a check-in or comment. Down the road, if we extend to a tiptap implementation, we can embed images inside tiptap, so we debated this in the planning phase. We landed on a separate photo upload to fit the check-in/check-out purpose.

HEIC Converter

Most image upload features don't actually support this, but Ellie and I are both iPhone users, so most of our iPhone images come out as HEIC. The usual upload flow accepts jpg, png, gif, and just refuses HEIC. We wanted the experience of opening Photos on a Mac and dragging straight in, so we built a converter to make HEIC uploads work. I assumed dropping in a single library would handle it. It wasn't that simple.

Fallback logic

Inspect the actual byte signature. If a file is already converted to JPEG, just rename it to .jpg and return it as-is.
Run through the five strategies defined in the conversionMethods array in order. Return immediately on success. Log success/failure and stats for each attempt.
If every method fails, summarize the accumulated error list to the log, then return the original file instead of throwing, so the upload flow doesn't break.
Expose getHeicConversionInfo and debugHeicFile for support and debugging. They show the strategies available in the current environment, the file header, the ftyp brand, and so on.

Conversion methods and libraries

heic2any: the default strategy. Tries three output options in sequence (JPEG 90%, PNG, JPEG 100%). Treats an empty Blob as a failure.
heic-decode: decodes the HEIF bitstream directly with heic-decode, normalizes the various data structures (Uint8Array, object form, etc.) into ImageData, draws to a canvas, then serializes to JPEG.
FileReader: reads a Base64 URL with no external library, loads it into an img tag, draws to canvas, and produces a JPEG Blob using only browser-native APIs as the fallback.
heic-convert: dynamically imports the Node-based heic-convert module into the browser bundle and tries Buffer to JPEG conversion. Treats an empty output buffer as an error.
Browser-native: as a last resort, uses only URL.createObjectURL and a canvas tag to redraw the image and produce a JPEG Blob.

I leaned on Claude Code a lot, but the result ended up being a fairly gnarly converter. With this many fallbacks, every HEIC image converts without exception.

The side effect: I now happily dig up any old photo and throw it in.

Wrapping up

Vibe coding!?

After the backend, I went through the frontend at the same fairly minimal level. The frontend leaned on Claude Code far more, and ironically, the higher my dependence on AI got, the lower my productivity went. Especially when I had to debug or fix an issue without much grasp of the code, Claude Code would either fail to find the simple cause of the issue or look at it through too narrow a lens and fall into an infinite loop. Most of the time, when I read the code myself and put my hands on it, the fix turned out to be simple.

AI writes code fast, sure, but in real situations the context drifts constantly because of policy changes and other issues. Trusting AI alone often left me stuck.

Writing an SDD or PRD doc, defining the spec up front, is the closest thing to a real alternative. Once requirements get complex and implementation gets complex, though, you often only know things "after building." A lot of people treat coding like math, where you plug in a formula and get one correct answer. Reality isn't that. In a working environment where requirements shift hour by hour, and even when you're coding solo, context gets lost all the time.

These days I only use Codex, with a flow of: requirements doc, then implementation steps doc, then development broken up by step. That's been my method, and it's been much more efficient than before. Once the project I'm on now wraps up roughly, I'll write a broader piece on what I've learned about vibe coding overall.

Wrapping the project

It's already been over a month since the first release. Using it internally for our own work makes the things I want to improve very visible. We didn't build this to open it externally, but I'd love to ship even a beta in the near future. The frontend has piled up a lot of bad code, courtesy of Claude. After the project I'm currently working on releases, the goal is to clean those up bit by bit while continuing feature development.

At my previous company I rarely coded the frontend hands-on, so this was the chance to deep-dive properly, study, and actually implement. The reason frontend feels hard isn't really the frontend itself. It's that it touches the user directly. When I touch what I built and catch a whiff of how bad it smells, I keep fixing and fixing until hours have disappeared. Even so, watching the UI come to life and run is its own kind of dopamine, different from the backend in a way I can't quite name.

Same as with the backend, going through this project gave me a clear picture of how I'd run the frontend on the next one.

I finally wrapped up the long-postponed frontend chapter. That's the front-end side. Backend post next.

Scrumble-related posts

Scrumble Team: First-Release Interview

Tony Cho (https://flowkater.io) — Fri, 26 Sep 2025 00:00:00 GMT

import InterviewCard from "@/components/InterviewCard.astro";

Right after wrapping the first release of Scrumble, Tony, Ellie, and George sat down to talk honestly about everything from how they joined the project to what happened behind the scenes at the workshop. If you want the full arc of the project itself, that's covered in the Scrumble Project Retrospective (June–August 2025). This post is the interview record: each person's view, in their own voice.

Q1. How did you join, and what were you hoping for?

<InterviewCard variant="tony" name="Tony" avatar="/assets/Tony.png"> I'd been away from hands-on engineering work for a long time, so I needed a project that would relight the fire. And ideally, something I'd actually use every day and could keep improving over the long haul. The idea came from the daily meeting notes I used to keep at my old company. Check in, check out, share what you're working on. When you come into the office every day and see each other for hours, even a quick check-in becomes the lubricant for team communication. I wanted to build something that contributed to teamwork through daily check-outs and work updates. My goal was to develop the BE/FE stack end to end and get my chops back, while also actually shipping a complete project. </InterviewCard>

<InterviewCard variant="ellie" name="Ellie" avatar="/assets/Ellie.png"> Like Tony, I'd been away from design work for a long stretch and was thinking, "I should be doing something..." The timing happened to line up, and that's how I ended up working on Scrumble. Personally, my own experience with daily meetings had been pretty positive. I'm someone who values "soft touch." That said, I'd never seriously thought hard about how much that process actually contributes to work efficiency or output, from a manager's seat. So when Tony told me he was feeling that exact gap, I got really curious: "What if we dig into users with this kind of need and shape the product around that direction?" As always... we started with the noble intention of "let's focus on the MVP," and somehow drifted into thinking about B2B sales (...) before we knew it. (We were literally just starting initial product planning.) Thankfully, we caught ourselves and reset our expectations to match the original intent! Hehe. At some point on this service's journey, I think we'll hit a fork between being a messenger and being a social network. It's a sensitive topic these days, so I'm being careful, but when that moment comes I hope we make a wise, confident call. Personally, I'm hoping we can land on a balanced kind of communication, one that holds onto the warmth of social connection without all the unavoidable stress that comes with it. Ultimately, it's a small wish, or maybe a big expectation, that we can make it all feel seamless. Hehe. </InterviewCard>

<InterviewCard variant="george" name="George" avatar="/assets/George.png"> I'd been studying through bootcamps and study groups, but I always felt there was a big gap between that and actual industry work. In the middle of that, I joined a company. It was different from the bootcamp days, but it still wasn't quite what I'd pictured as real work. I spent a little over a year there and felt like I wasn't really gaining anything. (My premise was wrong from the start. Building actual experience is what makes future job changes possible.) The pattern of just reading the room at work, killing time when nothing was going on... that's not the life I want. I want to actually know how to do this and do it well, so when I was given a chance to join the team, I took it. I want to finally start building real on-the-job know-how and actual communication skills with teammates. </InterviewCard>

Q2. What was the hardest moment along the way?

<InterviewCard variant="tony" name="Tony" avatar="/assets/Tony.png"> The hardest stretches were the wrap-up phase and the time I spent on the tiptap editor. Toward the end, we had a design issue that meant migrating every API, and along with that, a mountain of unfinished release tasks piled up. I just couldn't get any momentum and kept pushing the project back. The Jeju workation at the end of July helped me reset, but my biggest feeling is: I wish I'd pulled myself together a little sooner. And tiptap. I tried it early on, decided the feature surface was overkill, and rolled it back. That's the moment I most want to undo. Later, there were way more entangled features and way more accumulated data, so adding it back was much harder. I burned a ton of time on it and never finished. If I'd just done it from the start, things would've moved faster. That's the moment that sticks with me, haha. </InterviewCard>

<InterviewCard variant="ellie" name="Ellie" avatar="/assets/Ellie.png"> I'd mostly done mobile design before, so when I tried to work on web views... the canvas felt like open ocean. That's where I lost a lot of time. And I let my planning ideas run wild, daydreaming about a rosy future without permission, so I wasted a lot of the time I should've spent on the actual core work. My heart got way ahead of me (sob), and things got pretty tangled up. Hehe! A moment I'd take back? Pushing component-ization way too aggressively before the design was even fixed (which ended with me ripping it all out and starting over). Charging in with color choices first. I should have set the weight aside and focused on the truly core features and value, but I wasn't confident there, so I buried myself in interaction details instead. Skipping hand sketches because they felt like a waste of time and just sitting in front of the monitor flailing. Where do I even start? (sob sob) If I could undo all of it now... could I actually do it well? Hehehehe. The root of all of this is probably some unholy mix of perfectionism + arrogance + laziness + bottling it all up. Lesson I keep relearning: when you're setting direction, discuss fast, focus fast, execute fast. </InterviewCard>

<InterviewCard variant="george" name="George" avatar="/assets/George.png"> At my company, when a particular task ends and they don't give me the next one (because there's nothing to do), I sit there killing time. That stretch is brutal. Scrumble was the opposite. It wasn't that there was nothing to do, it was that I had so much to do, in my own way, that I didn't know how to start chipping away at it. Too much on the plate. That was its own kind of hard. Like Tony said, you just have to put in the raw hours and start with the small things, the things you actually can do. But because I had so much and kept putting it off, it snowballed and I'd end up just sitting there blank for long stretches. Because of that, there were periods during the project where I'd hit sudden, deep exhaustion. I haven't fully fixed the habit yet, but if I could go back to those moments now, I think I'd just start with the small things instead of overthinking it. </InterviewCard>

Q3. Tony, how did it feel to work as an engineer again instead of a manager?

<InterviewCard variant="tony" name="Tony" avatar="/assets/Tony.png"> I wrote about this in my retrospective too. It was genuinely happy and genuinely painful. And one thing became really clear: no matter how much AI boosts productivity, my mental model and memory have hard limits. To finish a thing all the way down to the last detail, I can only really work within the scope I can hold in my own head. Without Claude, this would've taken much longer and full-stack development would've been even harder. But there are still a lot of pieces I have to work through myself. Even while doing engineering work, the manager skills helped me sometimes (using AI well, making decisions). But the things I was actually good at as a manager (communication, prioritization) got dropped while I was head-down coding. That cost was real. This project was about going back to engineer Tony, not team-manager Tony, and from that angle I'd say it was a success. </InterviewCard>

Q4. What's the next goal you want to focus on, Tony?

<InterviewCard variant="tony" name="Tony" avatar="/assets/Tony.png"> For solopreneurs and small teams, what really matters isn't which tech stack you use. It's whether you actually build a product customers want. I've spent a long time on the product-management and team-management side, so now I think it's time to run with growth as the focus. I've done plenty of "talking" through this project. There's a recent talk I watched that said the whole point of a product leader, the final goal, is leading the team to victory. I shared this at the workshop too: I want to give everything I have to getting those wins. </InterviewCard>

Q5. Ellie, how did it feel to be back in product-designer mode?

<InterviewCard variant="ellie" name="Ellie" avatar="/assets/Ellie.png"> I felt my limits, hard. First, I had to admit, objectively, that I'd gotten pretty lazy. This was actually my first time using AI seriously. Once you taste what AI can do, you really do get lazy. I used to sketch by hand, thinking through technical constraints and development direction in my head, organizing UI and UX together... but this time I worked far more with my eyes than my hands. That said, for the kind of documentation work where the time investment usually outweighs the impact, AI made things much smoother. I saved a lot of time. The proudest moment was when I clearly mapped out the edge-case users for the service. It's a project that got me back into work mode after a long time, so just that part feels good. But I only managed to handle the open-ocean (...) of the web view, so I'm not super satisfied with what I made. Honestly, I want to rip it all out and start over. The desire is there, anyway, hehe. </InterviewCard>

Q6. Any conflicts with Tony you remember?

<InterviewCard variant="ellie" name="Ellie" avatar="/assets/Ellie.png"> Truth is, I can't even remember what we were fighting about that hard anymore. (Looking away.) When you collaborate, it's all just differences in perspective, and everyone's saying something reasonable, and then you start going "no you're the right one, no I'm the right one," and voices get raised. That's just how it goes, right? Hahaha! And thankfully, somehow, Tony and I always seem to find our way through and let it out, for better or worse. Hahaha! I think conflict and raised voices in a discussion are part of the process of reaching a result, so I expect we'll keep fighting plenty. -But the emotional toll inside that process, I can't ignore that either. (And these days, the emotional toll turns into a physical one too... apparently...?) So going forward, I think we need to set the focus of a discussion clearly. Not just the topic, but the limits (time, scope, feature boundaries) agreed on in advance. That should cut down a lot of the emotional drain. </InterviewCard>

Q7. George, what did you get out of investing your evenings and weekends?

<InterviewCard variant="george" name="George" avatar="/assets/George.png"> I didn't have a clear technical target from the start. My goal was more about engaging with the work more actively than I had during my previous baseball-app side project and study groups, and to actually try to communicate well. At first I was overdoing the communication a bit, and as the project went on I dialed it back. But I came to realize that everyday chat isn't the whole of communication. The real thing is sharing, on a work level, where you and the other person are in your understanding and what you're each working on. I want to apply that actively in the next project. As the project went on, I half-joked that I wanted to become an "API factory boss." But I still find it hard to write design documents and requirements documents, so I'm going to keep writing them, getting feedback, and revising. </InterviewCard>

Q8. Going forward, how do you want to contribute as a teammate?

<InterviewCard variant="george" name="George" avatar="/assets/George.png"> For now I'm part-time and handling small pieces. But even those small pieces are tough to do alone without Tony's help. Full-time is out of the question right now. I want to become a teammate that Ellie and Tony can hand the parts I currently work on to without worrying. </InterviewCard>

Q9. The first-release workshop: what was the best part?

<InterviewCard variant="george" name="George" avatar="/assets/George.png"> It was the first workshop I'd ever attended in my life. Both of you put so much into preparing it, and I wanted to match that effort and engage properly. I really listened to Ellie and Tony's presentations, and I got a lot from the materials too. So that's how you give a presentation, that's how you build the deck. I'd been worrying out loud beforehand (I don't know how to do this, building presentation materials is hard), but when I finished, you both told me you'd actually been worried too and thanked me for delivering. That meant a lot. Outside the workshop itself, the Seoul itinerary (the full course!) and being able to stay overnight in Seoul were so satisfying. So grateful. Honestly, the whole thing made me think: so this is what a startup feels like? </InterviewCard>

<InterviewCard variant="ellie" name="Ellie" avatar="/assets/Ellie.png"> For me, the main target audience for this first-release workshop was actually George. So I admit I put a little extra into the venue and the activities! If the workshop had just been about sharing deliverables, we could've done it remotely without losing much. But I wanted the atmosphere to match the purpose of our Scrumble service, and I think we landed it well at this event. I poured myself into the presentation deck. That's the magic of a workshop, the something that makes you look forward to the next one. Smooooth operator~~~~~~ </InterviewCard>

<InterviewCard variant="tony" name="Tony" avatar="/assets/Tony.png"> I sometimes come across as the "I do fine on my own" type, but the truth is I really love working as a team. I love people. So even though this was a much smaller team than what I'm used to, it was a blast. I wanted to show George, who hasn't worked at an IT startup before, what that culture looks like, and I also wanted to send the message that even though we work remotely, we're working as one team. Each person's presentation was great, and I think we genuinely listened to each other. This time it was a workshop celebrating wrapping the project, but next time I hope we can throw a real results-celebration party. The workshop did a lot to motivate everyone. </InterviewCard>

Q10. What's the next goal you want to hit with Scrumble?

<InterviewCard variant="ellie" name="Ellie" avatar="/assets/Ellie.png"> On the design side, it's the mobile view and branding. Because the design was patched together, I find myself side-eyeing my own work, hehe. I'm still figuring out exactly how to approach it, but I really want to fix it. At the workshop I shared "Insights on the G-type user, UX improvement points," and I want to keep stacking small UX optimizations like that. I believe small details add up to a big difference in experience over time. And (small confession here) when I documented and proudly shared what I called an "edge case" a while back, the more time passes the more I realize that "edge case" might just be... me, hehehe. So in that sense, Tony was my first customer at the start, but now I think I'm also a target user. Ultimately, I want to grow this into a service I'd be satisfied with myself. </InterviewCard>

<InterviewCard variant="george" name="George" avatar="/assets/George.png"> Through Scrumble I got to read both of your daily lives and work updates, which built a sense of closeness (hobbies and all). And writing my own daily and work updates gave me a chance to get closer to both of you. We don't have all of Scrumble's features yet, so I want to keep showing up for the development work and help build it into a product everyone on the team is happy with. </InterviewCard>

<InterviewCard variant="tony" name="Tony" avatar="/assets/Tony.png"> We don't have any sales lines yet, and I didn't build this as something I necessarily wanted to make public. But in the end, the real bar is opening the tool up and getting customers who pay actual money for it, right? I'm on a different project right now, but I keep slipping in small updates when I don't feel like working, so I hope we can get to a public beta soon. </InterviewCard>

Q11. What were you most grateful for in working as a team?

<InterviewCard variant="tony" name="Tony" avatar="/assets/Tony.png"> To Ellie We spend so much time working in the same space. Without Ellie, I don't think I would've had the courage to start this project at all. On my own I tend to give up easily, so I lean on Ellie a lot, and I'm always expecting beautiful copy and design from her. The whole three months of Scrumble is time spent with Ellie, so honestly, thank you for every single thing from start to finish. To George Thank you for not giving up and following us all the way here. I believe the result, whatever it ends up being, only comes to people who don't quit. As your mentor I showed plenty of my own gaps too, and I'm really grateful for how well you stuck with it. You've clearly grown compared to where you started, and I hope you keep that growth momentum going and even pick up speed. </InterviewCard>

<InterviewCard variant="ellie" name="Ellie" avatar="/assets/Ellie.png"> To Tony Aside from hospital visits (!), Tony shares the same space and time with me. Without Tony's decisiveness and that easy, lighthearted attitude, we probably wouldn't have made it this far. So please, get even more lighthearted! Don't lose the humor! Crank it up even higher, that's what I'm saying! There will be plenty more rounds of debate and argument and bruises and repair ahead, but let's keep working through them wisely and pushing toward something even more meaningful! Always grateful! To George I know coming into this team and going through it all with us probably wasn't easy. I see a lot of my own tendencies in you, which is why I sometimes find Tony a little (?!) frustrating, hahaha. Even so, thank you for staying all the way through without dropping out. You'll definitely feel that you've been growing, bit by bit, through this experience. And I believe that growth will be the strength that carries you forward! </InterviewCard>

<InterviewCard variant="george" name="George" avatar="/assets/George.png"> To Tony Tony, thank you so much for not giving up on me and pulling me along. You shared so much good information and so many guides with me, and that was a huge help. And above all, thank you for giving me the chance to work on a team like this. To Ellie I had relatively fewer chances to talk with you compared to Tony, but whenever I presented or did a retrospective, you listened so closely and reacted so warmly. That meant a lot. Hearing that you'd been cleaning and organizing before our Seoul trip, I felt bad. I knew you must've put in a lot of effort during the trip too. It was a really satisfying workshop and outing! From now on, I'm going to try harder to ask directly about things I don't know. </InterviewCard>

Q12. Anything you want from each other going forward?

<InterviewCard variant="tony" name="Tony" avatar="/assets/Tony.png"> To Ellie Let's talk more often and fight more honestly! Hehe. For the sake of product growth, for the sake of the win, let's pick back up the unfinished wish (?) we left somewhere five years ago and see it through this time. And let's run a little more for our individual growth too! And let's talk more often and more deeply with each other! To George Don't give up. Go all the way. Forget about how you are, how AI is, how the market is, all of that for now. Take pride in your work, find some enjoyment in it, and keep moving forward step by step, even if it's slow. I'll help where I can. Don't put a ceiling on your own ability and don't let fear stop you. Push harder. </InterviewCard>

<InterviewCard variant="ellie" name="Ellie" avatar="/assets/Ellie.png"> To Tony The unfinished wish~ just thinking about it~ Soy lago...😶‍🌫️ I hope the time we spend running toward our goal together becomes deeper and more meaningful! And so~ this is how it ended up! May we be able to land on a conclusion like that. Let's keep going! Talk more often! Argue more honestly! Resolve things more wisely! To George Please don't see yourself as small or downplay your work! I do that pretty often myself, hehehe. But it always trips me up and ends up burning moments I could've used to push further. There's no shortcut to believing in yourself and moving forward, so. Anyway, let's be the ones moving forward! </InterviewCard>

<InterviewCard variant="george" name="George" avatar="/assets/George.png"> To Tony I'm someone who shifts a lot depending on my environment and the people around me, so I want to get even closer with Tony. For that to happen, I need to be the kind of person who actually picks up on what's being said and acts on it. I want to keep growing so we can share both work talk and casual talk and just be on better terms. I know mentoring me is hard and busy work for you, but I want to keep up well and not let you down, and show you good process and good results. To Ellie Tony's been hearing my nonsense since we were young, so he's built up some immunity. But I could tell Ellie was finding it a bit heavy, hahaha. I'll try to dial it down a notch. Then I think I'll be able to get closer with Ellie too. I'm starting to practice asking you things directly now, so even if it's frustrating, please bear with me! </InterviewCard>

Workshop, in pictures

<section className="workshop-gallery"> <figure className="workshop-gallery__hero"> <img src="/assets/retro_space.jpeg" alt="The workshop venue" loading="lazy" /> <figcaption>The workshop venue!</figcaption> </figure>

<div className="workshop-gallery__row"> <figure className="workshop-gallery__item"> <img src="/assets/cutie_ellie_pt.jpeg" alt="Ellie preparing her presentation deck" loading="lazy" /> <figcaption>Master presenter Ellie's PT show</figcaption> </figure> <figure className="workshop-gallery__item"> <img src="/assets/pt_end.jpeg" alt="Ellie after finishing her talk" loading="lazy" /> <figcaption>The cute final slide of Ellie's deck</figcaption> </figure> <figure className="workshop-gallery__item"> <img src="/assets/OST_pt.png" alt="A slide from the workshop presentation" loading="lazy" /> <figcaption>Ellie didn't bring a deck; she brought a game</figcaption> </figure> </div>

<figure className="workshop-gallery__hero"> <img src="/assets/workshop_meal.png" alt="Grilled meat after the workshop" loading="lazy" /> <figcaption>After a workshop, gotta be grilled meat!</figcaption> </figure>

A great workshop. Let's do it again! </section>

The Scrumble team's first release wasn't just the moment we finished a feature set. It was the process of matching each other's pace and body temperature. What kind of growth curve we'll trace next season, and whether this interview becomes the starting point for another record, is what I'm looking forward to.

Migrating from Obsidian Publish to an Astro Blog

Tony Cho (https://flowkater.io) — Wed, 24 Sep 2025 05:18:38 GMT

Obsidian is a genuinely strong tool for personal wikis and writing, and if you care about Zettelkasten or documentation, it's the editor most people recommend these days. I never warmed up to Notion, and Roam Research always felt a bit unwieldy to me. As Obsidian started turning into the de facto standard in this space, I made the jump and committed hard. I even paid for Obsidian Publish and migrated my whole blog over from gatsbyjs, which had been working fine.

Deployment was fast, the site design was decent, and writing was comfortable. I switched over around 2023 and managed to ship a few retrospectives along the way, so it earned its keep. But using Obsidian Publish as a blogging platform turned out to be rough in a lot of small ways. The biggest issue was SEO. My old gatsbyjs blog had a steady stream of organic traffic, and after I moved to Obsidian Publish that traffic basically evaporated. If I wanted someone to read a post, the best option was to send them the link directly. Even for a blog I write mostly to scratch my own itch, having readers and not having readers are very different things.

It's file-based and doesn't enforce frontmatter. Once you add Obsidian sync and Publish, the price climbs. Custom things like Google AdSense and event tracking aren't supported; you're stuck with whatever Obsidian itself ships (like GA). And the biggest pain point is the lack of a comment feature. Not that anyone was lining up to comment, but I do have at least a small soft spot for a bit of back-and-forth, and the silence wore on me.

Even so, I wasn't writing that often, and Obsidian was comfortable enough that I kept using it. But now that I'm a free agent, what's left is the record I leave behind, and the record is the writing itself. I wanted to take it more seriously, so I moved again — this time to an Astro-based blog platform.

That makes Restarting Blogging date back to November 2019, and Blog Migration (Gatsbyjs to Obsidian Publish) to late December 2023. Looking at that first post, my real blogging history goes back to Octopress, then Jekyll, Gatsbyjs, Obsidian Publish, and now Astro. Five platforms in.

(I'm the kind of person who buys a notebook and pen before studying, or researches gear before working out, so my way of expressing intent is always pretty noisy.)

Astro bills itself as a web framework for content-driven sites, and it seems to power not just blogs but e-commerce and other kinds of sites too. Rendering is fast, SEO is configurable, SSR works, and a lot of upsides surfaced when I evaluated it. I'm using a theme right now, but I figured I could customize it myself down the line, so I picked it. The other candidate I looked at was Hugo, built on Golang. I judged Astro to be more extensible, so Astro it was.

The thing that gave me the most trouble during the move was converting Obsidian's folder-tree markdown files into a flat structure. I wrote a few automation scripts with Codex and ran the migration that way. And surprisingly, since Obsidian doesn't carry frontmatter (the post metadata), I had to look up the creation date of each old file and rewrite the format one by one. That was probably the hardest part. Thankfully — if you can call it that — there weren't that many posts, so it went quickly enough...

Since leaving the company, I've been on a "leave AWS behind" kick for a while and started using CloudFlare. CloudFlare Pages even supports Astro-specific deploys, so the deploy itself took maybe five minutes, and I added a few more conveniences with scripts.

Stack

Astro
CloudFlare Pages
Github (with giscus comments)
Decap CMS (not yet. Would the headless CMS editing experience even be good? Wouldn't it be easier to just write in a markdown editor and convert?)
Google Analytics / Posthog (always wanted to try it, so why not now)

I've been posting to Threads here and there too. I want to keep jotting down the non-retrospective stuff, the thoughts that drift through my head.

Let's go (or so I tell myself).

Scrumble Tech Retro - 1. Backend (Golang, DDD, Entgo, Event, Centrifugo)

Tony Cho (https://flowkater.io) — Mon, 22 Sep 2025 00:00:00 GMT

Scrumble Backend Tech Retrospective

Bring back the human connection we're losing in the AI era, into the way we work. Scrumble is a daily-scrum-based team communication platform built around emotional bonds and mutual support between teammates.

That was the early direction of the project. For non-technical details, you can read Scrumble Project Retrospective (June–August 2025).

There were a lot of requirements, but the core features ended up looking like this:

Workspace and member management (creation, invites, etc.)
Check-in / check-out posts (the daily scrum)
Feed list
Real-time post comments and reaction emojis
To-do list
Notification system

Stats and reports, third-party integrations, and bot connections are still WIP, so the implemented list is what's above. My current team uses this every day at the start and end of work, with active check-in scores, check-in posts, and comments flowing through it.

Looking inside the domain requirements, a few technical agendas stand out from the backend angle. The main three:

Domain and schema relationships across workspace, member, post, comment, reaction, and to-do
Feed list
Real-time updates (seamless UX)

It's a fairly typical SNS-plus-SaaS-platform setup.

Looking at my own list, it doesn't look like much... but below I'll share what I wrestled with while building each of these.

Tech Retro

Project Tech Stack

Language: Go 1.23+
Web framework: Fiber v2
Database: PostgreSQL 15
ORM: EntGo + Atlas migrations
Cache: Redis 7 (real-time state management)
Auth: JWT + Google OAuth (Goth)
WebSocket: Centrifugo (real-time reactions/comments)
Dependency injection: Wire (Google)
Logging: Zap (structured logging)
Dev tool: Air (hot reload)

graph TD
  subgraph Browser["Next.js"]
    UI["App Router / TanStack Query"]
    WS["WS Client (Centrifugo)"]
  end

  subgraph API["Scrumble API (Go Fiber)"]
    H["HTTP Handlers"]
    S["Application Services"]
    D["Domain"]
    R["Repositories"]
  end

  subgraph RT["Real-time"]
    CF["Centrifugo"]
  end

  subgraph Data["Data Stores"]
    PG[(PostgreSQL)]
    REDIS[(Redis)]
  end

  UI --> H
  H --> S --> D
  S --> R
  R --> PG
  S --> REDIS
  S --> CF
  WS --> CF
  CF --> REDIS

Fiber

The most common Go web framework is probably Gin, and I've been working with Echo before this, but I'll be sticking with Fiber for a while. The biggest difference: Gin and Echo are built on net/http, so they follow Go's server standard, while Fiber is built on fasthttp and isn't standard. It also has a smaller set of standardized core libraries compared to Gin. Still, the essentials are all there, and since it's Express-inspired, the initial setup is simple. People advertise it as faster, but once you attach a DB it depends on the situation and environment, and Gin/Echo are both plenty fast, so that's not the reason. (I haven't built anything serving the kind of traffic where it would actually matter.) Because it's fasthttp, the libraries for things like real-time differ a bit from the standard ones, but rolling your own isn't hard, and I'm using Centrifugo anyway, so it doesn't really come up.

If you're picking up Go for the first time, I'd usually recommend Gin. But if you've enjoyed working with Express in Node.js/TS, Fiber is a fine choice too.

In Go, what the web framework does sits on the outer edge anyway, so as long as you've cleanly separated the handler layer, swapping frameworks isn't a huge job. Honestly, going non-mainstream is just my taste, so take that with a pinch of salt.

Entgo + Atlas migrations

To be precise, the stack is Entgo + SQLC. About halfway through the project, I realized Entgo doesn't issue JOIN queries, and that it leans more toward being a type-safe query builder than a real ORM. Given that my architecture already separated domain entities from Entgo schema entities, Entgo became more of a headache the deeper I got into the project. I'd been using gorm before that, but Entgo felt like it might be the new standard, so I introduced it without much thought. The result was a mess. The pain points piled up:

No JOINs
- The default behavior is the N+1 query pattern. If you write client.User.Query().withPosts() in code, instead of joining, it SELECTs from users, then takes those IDs and runs a second SELECT against posts.
- There are reasons for this. Turns out Entgo plays well with GraphQL (which I keep doing wrong, apparently), and it's designed so you build queries in a graph style. That design has its upsides: cleaner entity mapping, easier control over lazy/eager loading. But you end up firing N queries when one would have done the job, and that's a real performance hit. While trying to express it in Entgo's syntax, I kept thinking, why am I doing this when raw SQL would be one line? Eventually I went with CQRS and switched the Query repository to SQLC.
Code-first schema migration
- Code-first means you control DB tables from code. The problem is that PostgreSQL has a huge feature surface, and there are times when you have to wire those features up by hand in code. And while you're managing migration files, you'd like to keep a record of what changed. But because it's code-first, when Atlas reads my code to generate migrations and syncs the schema, the SQL migration files I'd written by hand get wiped.
- For things like creating GIN indexes on JSON columns, I had to dig through the latest library code to figure out how Entgo supports it. Claude didn't have current info either, so it kept generating the wrong schema definitions, and I burned a lot of time on what I'll politely call grunt work.
Mountains of generated boilerplate
- Because it's type-safe and code-first, defining an Entgo schema and running generate produces a vast number of default files. I prefer not to have my searches turn up code I didn't write, so the constant hits on code that was, in a sense, someone else's, got under my skin.

Unless your ORM ships a lot of conveniences, like JPA, ActiveRecord, or Django ORM, or it's the de facto standard, the whole point of using one is to make the object-mapped schema slightly easier to handle in code. For that level of value, Entgo had too many drawbacks for me. And in an early-stage project that doesn't really need the complexity of CQRS, the basic query-performance issue alone forced me into it.

If I were running a tiny microservice with very simple entities, paired with GraphQL, no complex layering, sure, Entgo might be worth a look. But for that kind of simple server, do you even need it? Anyway, I picked it from day one and I'm seeing this project through with it, but I wouldn't choose it again. Given my preferences, future Go projects will use schema-first SQL migrations and an ORM that runs JOIN queries by default. I don't want to split things into Command Repository and Query Repository early on. (For reference, my current project uses Bun ORM and golang-migrate.)

Other pieces

I've stuck with PostgreSQL as the standard database. For real-time and caching, Redis (more on that below). Social auth is Goth. It's not at Supabase's level, but it's pretty simple, just hook up the API and the keys. For DI I went with Wire, which works at compile time. That's the standard pick in Go DI, nothing fancy. Logging is Zap, also a standard pick, and it does structured logging well. For the dev server I used Air. Considering Go is a compiled language, the experience of having it auto-compile and hot-reload on every code change feels almost like developing a server in a scripting language. When I worked in Spring, the slow startup was painful since I'd been spoiled by Ruby and Python server boot times, and it was hard to keep development flow going. (Maybe it's faster now? I haven't touched Spring in a while.) Go's lightweight, fast feedback loop is genuinely great.

Architecture (DDD / Clean Architecture layering)

Scrumble is essentially a workspace-based SNS, so I needed an architecture that could grow. I aimed for one that pays off more in the middle of the project than at the very start, and I refactored the early implementation more than once.

The architecture follows Domain-Driven Design with the layering Interface (handler) <- Application <- Domain <- Infrastructure (repository+).

Interface handler functions and Application service functions are 1:1, and the business logic lives in Application, where I compose Domain entities and functions. I avoid building services as separate Domain structs (what would be classes in OOP). Leaning into Go's package-oriented nature, I use package-level global functions instead. Repositories are also defined as interfaces in the Domain layer, so in practice Application does all the work via Domain entities + Domain package functions + Repository functions (interfaces).

Since the project leans into DDD, design always begins with Domain entities and value objects, then Repository interface definitions and use-case implementations (Application). The Infrastructure schema (Entgo schema structs) is built separately to mirror those, and the Repository implementation loads those schema structs and converts them into domain structs before handing them up to Application.

Early on I used to return domain objects from Application too, but partway through I redid the structure. On top of Handler request and response types, I introduced separate DTOs in Application as well, decoupling the layers as much as I could.

For a really simple microservice, I think it's fine to write queries directly in handlers. The mapping code between layers is genuinely a pain (especially when handling slices in Go), but the samber/lo package cut a lot of that down.

graph LR
  H[Interface / Handlers] --> A[Application Services]
  A --> D[Domain - Entities, VOs, Policies]
  A --> IRepo[Repository Interfaces]
  subgraph Infrastructure
    DBRepo[DB Repo - Ent / SQLC]
    CacheRepo[Cache Repo - Redis]
    EventPub[Event Publisher - Centrifugo]
  end
  IRepo -.implemented by.-> DBRepo
  IRepo -.implemented by.-> CacheRepo
  D <--> DE[Domain Events]

Testing

Test code

I write tests in BDD style with ginkgo/gomega. Having learned TDD in RSpec, the default Go testing style isn't very intuitive to me. My TDD philosophy is: for anything that touches the DB (repositories, application services), run integration tests against a test DB. Mocking is reserved for the truly external things like OAuth, or these days an API like GPT. In the era of Docker-based development, mocking the DB itself feels like a real waste of time to me.

Domain layers are pure functions, so unit tests are easy. For repositories I don't test everything, but for complex queries or specific business rules I run integration tests against a real DB. Application business logic is the same, integration tests against a test DB.

That said, there was one time I told Claude Code to fill in missing test coverage without thinking it through, and it ignored what I'd specified and dumped a pile of mocked-DB tests. So a chunk of the current test suite is mocked-DB tests, which makes adding new repositories really annoying. I need to clean that up but haven't gotten to it.

Apidog

For API testing I use Apidog. I cycled through Postman, Insomnia, and Bruno, and Apidog covers most of Postman's core features while doing automated documentation and scenario tests really well at the tool level. Even the free tier is generous enough that solo work is comfortable on it. You can have AI auto-generate Swagger, sync it straight into Apidog, and immediately fire test requests, with request/response schema validation included. Highly recommended.

(Why not Postman: too many features, too heavy. The UI also doesn't feel clean to me, so I preferred Insomnia. But Insomnia's updates went weird, so I tried the open-source Bruno, which was too feature-poor. I was looking around and found Apidog.)

Performance testing

I didn't run a proper load test. I did do performance comparisons between the original repository and the optimized query-side repository. The catch was that adopting the query-side version would have required changing the entire client structure, so I didn't roll it out. I ended up further optimizing the legacy repository queries instead, and that test is now deprecated.

graph TD
  U[Unit: Domain - Ginkgo/Gomega] --> I[Integration: Repo+App - Test DB]
  I --> E[E2E: API scenarios - Apidog]
  E --> P[Perf: optional load metrics]

Real-time

In Scrumble, after writing a check-in or check-out post, you can leave reactions and comments like on a regular SNS. But beyond just commenting, I wanted the room to feel alive, with emojis and comments landing in real time and a seamless UX. Real-time was a baseline requirement.

So at first I built a WebSocket server directly on top of Go Fiber. Then I found https://centrifugal.dev/, which is genuinely impressive. I threw out my code and migrated everything over.

If you build the WebSocket server yourself, you have to handle all of this:

Message loss and reconnection
Horizontal scaling later
Online presence, permissions, namespace management
Server-side communication protocol
Operational visibility
And more

For a quick prototype, doing it yourself is fine. But once you start thinking about production-grade real-time, the cost of building all of it adds up fast. Centrifugo provides the genuinely hard parts of real-time systems (lossless recovery, large-scale fanout, presence and permissions, observability, multi-node scaling) at the framework level.

You spin up a Centrifugo server, hook up Redis, implement handlers in your real server, focus on the logic, and that's it. With how good the future scaling story and developer ergonomics are, there's no reason not to use Centrifugo.

sequenceDiagram

  participant C as Client (Next.js)

  participant API as Fiber API

  participant PG as PostgreSQL

  participant OB as Outbox (TX)

  participant PUB as Publisher

  participant CF as Centrifugo

  participant O as Other Clients



  C->>API: POST /posts/:id/reactions

  API->>PG: INSERT reaction (in TX)

  API->>PG: INSERT outbox_event (same TX)

  PG-->>API: COMMIT OK

  API->>PUB: notify new outbox_event

  PUB->>CF: publish reaction.added

  CF-->>O: push event

  C-->>C: optimistic UI (optional)

The catch

I figured Centrifugo would make my real-time worries disappear, but there was one big issue. Some of Centrifugo's recent features are only available on Redis v7, v7.2, and v7.4. In particular, presence, history, and TTL all live in v7. The channel-history feature (for message persistence and recovery, the one I praised above as built-in) sits there too. Our deploy infra, Upstash Redis, only supports up to v6.2. I didn't know I needed to set those configs to 0 and disable recover, so I shipped with default settings, the real-time features didn't work in production at all, and I burned a fair bit of time digging through it. Upstash is very economical, so even with those features off it's not a real problem, but it is a shame.

Even so, since adopting Centrifugo, the real-time server itself hasn't broken in production once. The only real-time issues we've seen have been frontend handler subscription and reconnection logic dropping connections.

Event-driven

With Centrifugo and Redis as the backend, I built domain-based event-driven patterns on top. Event messages are mostly defined in each domain layer, and event emission happens inside the business logic in the application layer. You could put it in the domain layer instead, but I went with application-layer code for the directness of the code flow, the fact that early-project events are almost all 1:1, and easier debugging. In the interface layer, alongside HTTP, I added an "events" interface and built handlers there too. These handlers are registered to handle domain events, and like HTTP handlers, they call into application-layer functions.

graph TD
  CMD[Command - API] --> TX[DB TX]
  TX --> W[Write Domain Data]
  TX --> OBOX[Insert Outbox Event]
  OBOX --> COMMIT[Commit]
  COMMIT --> WKR[Outbox Worker]
  WKR --> PUB[Publish to Centrifugo]
  PUB --> CLIENTS[Subscribed Clients]

Other

Caching

Everything else gets basic caching with Redis: feed summaries, my own writing state, the spaces I've joined. These are calculated queries that can hold a long TTL. The default is cache-aside, with invalidation policy implemented per event handler when other events fire. It's intuitive, but registering invalidation handlers one by one for every mutation event is a bit of a chore.

flowchart TD
  RQ[Client reads Feed] --> GET[Redis GET]
  GET -->|miss| DB[SQLC Query]
  DB --> DTO[Shape DTO]
  DTO --> SET[Redis SET - TTL]
  SET --> RESP[Response]
  GET -->|hit| RESP

  subgraph Invalidation
    EVT[reaction/comment created]
    EVT --> DEL[DEL ws:id:feed:*]
    EVT --> PUB[Publish feed.invalidate]
  end

CQRS (Query Repository / SQLC)

When I say CQRS I don't mean splitting the Query DB out fully. Because of Entgo's query-performance issues mentioned above, I introduced SQLC, which is closer to raw SQL, and split off a Query Repository. The Query Repository skips the domain interface and acts as read-only. Application calls it directly and gets back Application-layer structs.

Personally I think DDD fits Command well. For most of the values an endpoint or a screen needs, instead of routing through the structured shape of domain structs, it's more natural and more performant to fire a DB query and return what the screen wants directly. That fits a Query interface better, in my view.

I introduced this because of Entgo, of course, and you might say there's no reason for early-stage projects to reach for CQRS. But to honor the spirit of the project laid out in Scrumble Project Retrospective (June–August 2025), I went in with the attitude of "let's just try everything I want to try, the dumb way."

We use SNS daily, so we tend to assume it's easy to build. But these services come with more performance concerns than you'd expect, so building API endpoints intuitively to fit each screen wasn't a bad call. What used to be many round-trips through Entgo became a clean JOIN in raw SQL via SQLC, and queries got a lot faster.

You can write raw SQL with Entgo too, but joining several tables (posts, reactions, comments, mediafiles) and doing it in a type-safe way with proper object mapping was harder than I expected. With SQLC, you write a SQL file and generate a Go file that maps types safely, so I got the performance and type safety together.

graph TD
  CMD[Command] --> AC[App - Command]
  QRY[Query] --> AQ[App - Query]
  AC --> RC[Repo - Ent]
  AQ --> RQ[Repo - SQLC]
  RC --> PG[PostgreSQL]
  RQ --> PG
  AC --> EV[Domain Events] --> CF[Centrifugo]
  AQ --> C[Redis Cache] --> AQ

Deploy infrastructure

For final deployment I'm using fly.io for both DB and server, with Upstash Redis. Both the backend server and the Centrifugo server run on fly.io. It's only used internally right now, so infra cost is basically zero.

If you've read this far you might be thinking the optimization (caching, queries) feels excessive for an early-stage service. The reason: my initial DB infra was Neon. I'd heard "serverless" and pulled it in, but Neon has a few quirks, and on top of that my fly.io server was in NRT (Tokyo) while Neon's only Asia region is Singapore. The server-to-DB latency was substantial. (Average 200ms+, so feeds with lots of posts and comments could take 2 seconds to load.) That pushed me hard into Optimistic UI on React Query and aggressive query optimization and caching on the server side, just to make this infra work.

Eventually, after seeing the cost on Neon, I migrated to fly.io and unified everything in NRT. The optimizations stayed, so the service runs much smoother now.

The open question is whether I'll stick with fly.io for an actual public release. Deploys are easy, the experience is intuitive, and the early cost is low. But the status page is yellow disturbingly often, and individual regions go down. The service stays up, but the web console server goes down and the page won't load. Things that wouldn't happen on AWS happen often here. At some point I'll need to rebuild on more reliable server infra.

graph TD
  DEV[Local Dev - Air] --> CI[CI Build & Test]
  CI --> REG[Container Registry]
  REG --> API[Fly.io Scrumble API - NRT]
  REG --> CF[Fly.io Centrifugo - NRT]

  subgraph Data
    FPG[Fly.io Postgres]
    UREDIS[Upstash Redis]
  end

  API -- SQL --> FPG
  API -- cache --> UREDIS
  API -- publish --> CF
  USERS[Users] --> API
  USERS -- WS --> CF

Vibe coding on the backend?

User → Member migration

Early on, to move quickly, I designed most entities around the User schema. For a workspace-based SaaS like Slack (where each workspace has its own profile and you log in per workspace), I needed to migrate most of the domain/schema entities and the auth structure to a Member entity (a user-to-workspace relation schema) instead of a User entity. I knew partway through that I had to switch, but I kept pushing it back to keep adding features. By the end of the project, the migration was tangled with multiple domains and the dependency cost had grown.

Migrating top-level entity relationships isn't actually hard. It's mechanical and repetitive. So I documented the spec and plan in Kiro, which had just launched, and handed it to Claude Code. Claude Code went on a wild detour: ignoring the existing architecture, ignoring existing code, generating duplicate domains and code. I ended up redoing about three days of work by hand. So for the backend, there isn't much to say about vibe coding. Later on I only used it to automate trivial scaffolding for simple API setups, and most of the work I did myself. The frontend, on the other hand, leaned heavily on Claude Code, so I'll cover more there.

Wrap-up

One of the original goals of the project was "get the chops back", and that goal was met. Now, with production release in sight, I'm building on a cleaner, clearer architecture in the current project. To summarize:

Keep

DDD/Clean Architecture, plus CQRS and event-driven. The key is putting ports at every layer and enforcing real separation. Separation gives the domain implementation room to breathe. (And reference the domain everywhere.)
- There's a tradeoff with duplicated code, of course
Centrifugo
Working with Redis caching
A clearer test strategy

Problem

Picking an ORM (Entgo) without enough evaluation. The resulting design got more complex (CQRS, SQLC)
Code-first migrations via Entgo
Region split at deploy time (Neon SG ↔ Fly NRT). Neon ate a lot of my time
Didn't clean up code generated badly through vibe coding
Insufficient research at the dev stage when adopting Centrifugo and similar tools (the version issue surfaced at production-release time, after development was done, costing real time)

Try

Schema-first migrations instead of code-first
Long term, I need to swap the production deploy infra
Get operational visibility in place
Remove the mocked-DB tests and clean up the badly generated code

That's about the size of it.

If you've read all of this, you'll see the core of this design isn't tied to Go. It applies in any language, any framework.

One thing I miss while working in Go is functional programming. The style I enjoy in Java lambdas, Kotlin, and JS/TS feels less fun in Go. samber/lo helps, but as a language that got generics late, the code doesn't feel that intuitive. (lo does make it much cleaner, to be fair.)

If I do Spring next, it'll probably be because of Kotlin. That said, Go's directness pays off. Once you put in the small upfront cost, it's easy to collaborate on and easy to operate. It'll stay my primary choice for products I lead. I hope more developers in Korea pick up Go.

I'll keep going in the frontend post...

Scrumble Tech Retro - 0. Intro (Why Golang?)

Tony Cho (https://flowkater.io) — Sun, 21 Sep 2025 00:00:00 GMT

Intro

Why Golang

Everyone has a primary language, and mine is Golang. It doesn't carry that image in the Korean dev scene, but overseas it's known as a fairly boring language. There's a reason for that: the language really is simple. Most Korean backend developers have Java as their primary, and next to Java, Golang is stripped to the bone.

That's why Golang has no "canonical" full-stack web framework (think Spring Boot, Rails, Django) and no "canonical" ORM (think JPA, Active Record, Django ORM). It's not unusual to find shops that skip the web framework entirely and code straight against net/http. The language is simple enough that it ends up being handy for small microservices. On the language-feature side, there's no constant immutability the way functional languages have it (you can't declare a "reference-immutable" variable like TS const or Kotlin val. Go's const is a compile-time constant only, so most code uses regular var). And error handling has no try/catch; you return the error explicitly alongside the result, usually as a tuple.

I've made this sound complicated, but if you followed along you can see the point: it really is that simple... So I get why people overseas call it boring. It isn't a fun language to write, exactly.

For fast bootstrapping there's a whole field of solid scripting languages (Node.js, Python, Ruby), and for performance and stability Korea is basically the so-called Republic of Java (Korea is dominated by Java in enterprise development), so most teams reach for Java. Still, Golang has a nice middle ground: you can code as lightly as a script, you get static typing from a compiled language, performance is genuinely good, and the goroutine-driven concurrency story makes real-time work and live-service operations a lot easier downstream. Even in Korea you see it picked up by larger, traffic-heavy companies in cloud-native, data infra, microservices, and DevOps tooling. Sure, a team like Daangn (Karrot) running ten-million-plus chat messages a day (Daangn is Korea's largest local marketplace app) on Go isn't exactly relatable for shops like ours. But I still think it's a great language for a solo dev. It's light, it's nice to live with.

(I actually got into Golang in early 2021 after this video sold me on it: Daangn Engineering Adopts Go | Daangn Tech.)

I went Rails (Ruby) → Django (Python) → Golang, then spent my last job in Node.js (TS), and when I started over I came right back to Golang as if it were obvious. There's more I could say about why it's good for a team and so on, but this is getting long, so I'll move on.

Rust is my aspirational language, but I've never used it on a real project.
There's also a piece called Why Go is Comfortable for Vibe Coding that's worth a look.

Next.js

The frontend was the comfort-food pick, the opposite of the backend. I went with Next.js (TS), Tailwind CSS (which always saves me because I'm bad at CSS), and TanStack Query (a.k.a. React Query). I once chased GraphQL as my aspirational stack, but my last job made it painfully clear how shaky my GraphQL fluency actually was, so I bailed. At my last job we had a frontend lead who was a GraphQL master, but going solo I just picked what was easy for me to use and reason about. And, as I'll get to later, leaning on React Query the way I did turned out to be a real mistake.

Since it's the default, there's no real "why". I'd done React before, it was the obvious choice for full-stack work in older projects like <DataColon Dev Retro> - 1. Intro, and nobody is going to push back on it. The thing is, in the era of AI vibe coding I spent a lot of this stretch wondering whether React and Hooks are really the right call beyond "the docs and the community are huge."

Order

Here's how the series will run. I'm going to dig into the actual stack across the domains I built, and along the way I'll write down the decisions I made and the things I wrestled with. As it happens, the order I built things in lines up almost exactly with the calendar.

Scrumble Tech Retro (subtitle: Full-stack Dev and the Failure of Vibe Coding. Golang and Next.js) - 1. Backend
- Base tech stack
- Architecture
- Testing
- Real-time processing
- Misc
Frontend (continued in part 2)
- Base tech stack
- Architecture
- Real-time processing
- Misc

Scrumble Project Retrospective (June–August 2025)

Tony Cho (https://flowkater.io) — Sun, 07 Sep 2025 00:00:00 GMT

Opening

I left my last job in mid-April and spent about a month traveling Italy with Ellie. By the end of the trip I came home a little tired, but a few months later I find myself wishing we'd stayed two more weeks. It was the first long trip of my life taken after quitting a job, and since I wasn't returning to a company anyway, maybe in some sense the trip never really ended. Or maybe I'd circled back to that moment five years ago, the one where Ellie and I were running my own business right up to the bitter end.

Setting that aside, if I wanted to start working for myself again, I needed to build something with my own hands. I'd been away from code for a long stretch, so I figured the right move was to pick a small project and use it as a re-entry point. That's what I came home with: a plan to do one side project during the gap before the real business prep, and Ellie and I started in earnest at the end of May.

I'm splitting this retro into two parts: a general project retro (this one) and a tech retro covering full-stack development and working with an LLM (next one).

What Scrumble Was Trying to Be

Scrumble grew out of the daily-scrum meetings I'd been running since my Todait days and at every company after that. I'd actually wanted to build it last year while still employed. At that company we used Confluence for our daily meeting notes, and as the team grew and split, the format got painful in specific ways:

Whoever wrote, kept writing. Whoever didn't, never did.
Past entries were almost impossible to scan. (Confluence supports a DataTable, but the UX is rough; it feels like typing values into a database or a spreadsheet.)
We had over 30 people stuffed into a single note, and you couldn't take any of it in at a glance.

The problem with that note, as I came to see it, wasn't unique to Confluence. Whether the team is small or large, whether you're using old-school Evernote, Dropbox Paper, Notion, or Confluence, the moment the note loses its tight connection to the actual work, its usefulness drops off a cliff. Eventually the casual condition-check and daily-life sharing stops working too, and the note's role as a team communication channel withers.

For team members, the goal is to keep team communication alive without forcing constant coffee chats or meetings. But I felt that pure check-in scores or vibe scores weren't enough on their own to justify the habit. So one of our targets was to make it easy to write the day's work to-dos in the same place.

We weren't trying to replace Slack-style chat tools or Jira/Asana-style PM tools. The product wasn't positioned that way. The plan was to integrate with those tools as much as possible.

To restate the project goals:

Main: build a daily-scrum service we'd actually use ourselves (MVP)
Sub: give the team enough time to adapt and practice (work skills, collaboration) before the real development project

(The very first PRD draft)

Project Setup

Where We Worked

I trust systems more than enthusiasm, and environment more than talent. (That's relative; I'm not saying talent and enthusiasm don't matter.) When I take on any kind of work, I care a lot about making the work able to happen: getting every stakeholder's economy aligned, keeping the cost and time low. Doing all of that from home created a different set of constraints, and I'm still figuring out how to handle them.

Most of the work happened at home.

Home

From a system-and-environment angle, home isn't optimal. Our place doesn't have a ton of rooms either. But Ellie still remembers the years of writing checks for several hundred thousand won every month for an office that sat mostly empty when we ran our previous company. So in cost terms, home was the obvious choice for now.

Two monitors and an M4 Max. The kind of place where you can also cook your own meals.

That said, after three months of working this way, I think we'll need to lock in some revenue and get an actual office in the longer run.

Daily Life

It's a side project, but a full-time side project, which meant the risk of staying inside the apartment all day. So we built a strict daily schedule. I started treating the gym trip with Ellie as our morning commute and built the work cycle around it. The first month, I mostly held to it. By the second month, as we entered the late stage of the project, the cycle broke and I was working at random hours. To make this concrete:

When the routine held: 10 p.m. bedtime, 5:30 a.m. wake-up, morning workout, then "into the office." Almost a model life — if you can call it that. At the end of the project: 4 a.m. bedtime, noon wake-up, lunch, then starting work around 2 p.m. That kind of damage.

By the end, both of us were worn down enough that we took a workation just to wrap things up.

I barely worked from home during COVID, so I didn't really know what working from home was like. I'm learning now, in my body.

Communication

It's been just Ellie and me on side projects (even when I had a day job), but this time we also had a remote team member: George, a backend intern. And even with two people, you can't keep all the records offline. Slack, Notion, Jira, Confluence, plus tools I'd used before like Linear and Asana. There are a lot of options.

We had to keep costs down, and small teams that touch a bit of everything end up locked into massive subscription stacks. I'd actually settled on Slack + Notion + Linear despite all that, until Ellie brought a super-app called ClickUp into the mix and we picked it.

ClickUp is, literally, Slack + Notion + Linear. Channel chat for communication, Markdown docs, project management: three categories of tool for the price of one SaaS subscription.

When I looked it up, it seems Koreans don't really use it. Given how much Korean teams love all-in-one products, the lack of marketing here is honestly surprising.

After using it, you can see clearly that each of its three pieces is about 10% behind Slack, Notion, and Linear individually. But when those three pieces are integrated tightly inside one tool, the value adds up to a lot. We've been getting a lot of mileage out of it. Flip the framing and it's kind of impressive: each function works at maybe 90% of the standalone, and anyone who's built software knows how hard it is to make all of it actually function together.

Anyway, communication tooling: solved cleanly with ClickUp.

$30/seat down to $10/seat
Slack-style channel chat and Notion-style docs both supported well
- The 10% gap I mentioned shows up only when you go deep on each one. Day to day, it's fine.
Project management gives you the intuitive subset of features you actually need

First MVP Feature Goals

The first MVP target was:

Authentication
Space management
Check-in / check-out writing
To-do list writing
Feed page
Real-time comments and reactions
Real-time notification page

We pretty much hit those. But midway through, we decided this wasn't enough and we needed to add more, and that decision cost us about a week.

An MVP that should've shipped in 8 weeks ended up taking 12.

Looking back, the first feature set did get fully built, and the original goals of the project were roughly met:

Main: build a daily-scrum service we'd actually use ourselves (MVP)
Sub: give the team enough time to adapt and practice (work skills, collaboration) before the real development project

Of course, if you subtract the bad decisions, the dead-end detours, and the days I just didn't work, you'd probably knock about three weeks off the total.

The detail comes in the retro below.

Retro

Keep

Daily Dog Fooding

Daily dog fooding will probably be a core principle in future projects too. Because we were building something we'd actually use, I wasn't just trying to get it done; I kept thinking about how to make it better. At my last company, the makers couldn't really be the users, which capped how far we could push that. Using the product every day surfaced everything from simple bugs to improvement ideas to the next pieces I needed to build. When you don't have customers yet, dog fooding is the best instrument a maker has.

Hands-On From Scratch

I'll go deeper in the tech retro, but: Golang (Fiber + Entgo) + Redis + PostgreSQL on the backend, Next.js on the frontend, full-stack, end to end, all by hand. I broke a sweat getting my hands back on the code I'd let go cold for years, but with $200/month and my friend Claude Code, I got to a place where I can do hands-on project development on my own.

Joy

It's been hard to remember when, at my last company, I last felt joy in making something on a project. Set aside whether I use the product. I just enjoyed coding every day. From the time we adopted DDD at that company, I'd basically taken my hands off the code, but on this project I got to do everything I wanted, from initial DDD/Clean Architecture layer design to CQRS to event-driven patterns. Frontend isn't my main turf, but tweaking styles one by one and watching the thing come together, I felt both the joy and the pain of that. It wasn't quite Linus Torvalds's Just for Fun, but rediscovering the joy of building something with my own hands, in this stretch, mattered a lot.

Mentoring

I love teaching, but back when I was wrestling with what good teaching even meant, I quit both the lecturing and the mentoring. That's the hard part of education: not teaching, but teaching well. I wanted to help George keep going as a developer, and that's why I brought him into the project. He's still pretty much an entry-level backend developer, but he can now build a basic API on his own. There were moments along the way when I got short with him or showed how impatient I could be. Every time, my own bottom shows through and I'm embarrassed by it. But George stuck with this all the way through while holding a part-time job, and that effort is exactly why I wanted to teach him as much as I could.

Mentoring is still hard. Each person's level is different, and what they need to hear is different. The thing is, dev skills and the rest are just tools. What trips most people up is how to work, how to work with others, how to talk about it. To get real S-tier (highest-caliber) teamwork, good mentoring is less about praising what was done well and more about pulling out the harder, less comfortable conversations.

Obviously, when I get emotional, I help nothing. People talk easily about how AI will replace teachers because AI is improving. But AI is a tool. To get a good answer, you need a good question. Beginners can't ask good questions yet. They don't yet have a high tolerance for failure. Helping someone surface those gaps and push past them is, for me as a mentor, genuinely hard work, and I want to keep studying it, including the playing-coach skills of running alongside them.

Problem

Lack of Communication

This is the biggest problem I felt running a duo project. Ellie's direction as product designer and my direction as product manager (before I'm a developer) didn't always line up, but even though we sat next to each other every day, we didn't hold regular check-ins or talk enough. We were each so deep in our own deliverables that we'd sit in the same room for ten-plus hours without exchanging a word, me coding, Ellie designing. And because I could implement everything myself, I'd also just push small things through, or change the spec on my own judgment without enough back-and-forth before implementing. We did a real retro afterward and put a daily meeting on the calendar, but the biggest problem of running the Scrumble project was that we weren't doing Scrumble.

Spec Creep

The first MVP was, frankly, the bare minimum that worked, missing the differentiating integrations we'd talked about (the LLM-driven report, the third-party hookups). Each of us had features we wanted to add, and without a clear conversation about whether to push those into the MVP, the schedule started ballooning. The root cause of an MVP that could've shipped in two months stretching to nearly three was that we never had a real conversation about that bar, and just went heads-down on our own pieces.

No Real Deadline

It wasn't that there was no deadline at all. But this isn't a public-launch project, and definitely not one we started for commercial reasons, so the two problems above piled on. Just one more week. One more week. And the schedule slowly drifted. Most failed side projects probably look like this. Most teams get the project to 90%, but the last 10% (the launch prep, the must-have features you kept pushing) actually takes as much energy as the previous 90%. It isn't fun work. So in retrospect, that's probably part of why I kept arguing for adding more features.

Pain Equal to the Joy

Tools like Claude Code and Cursor, riding the vibe-coding wave, are sold like one click and you can build software without knowing how to code. But unless your service is trivial, code still has to be touched by a human. As the feature surface grows, fixing A breaks B all the time. AI that misreads context creates duplicate logic, ignores the architecture layers, or invents its own pattern. Plenty of failure modes.

Backend I'd kept partially current, but frontend I hadn't touched in ages, and every piece was a slog. Layout and styling issues, real-time handling, plenty of stuff Claude Code couldn't one-shot. After feeding it more context and pushing it for a while, I'd often end up fixing the code by hand and resolving the issue immediately. So the joy of writing code came with a matching weight of frustration. The same was true for the simple-but-tedious setup work I was avoiding, like adopting tiptap and a few other must-have features I still haven't built.

Bad Calls, Not Enough Thought

Because we have the structure of a SaaS that creates a workspace by default, there's a split between user-level auth (logging into the service) and member-level auth (being a member inside a workspace). To move fast early on, I skipped that detail and just used user-auth the way I always had. As the service was nearly wrapping up, I realized how badly that choice had aged. We migrated the entire API design and auth scheme to be member-based, and that work alone, including verification and testing, took maybe three or four days. If you add the days I avoided it because I didn't want to do it, it was over a week.

I'd also planned to bring in tiptap from the start but stopped because it felt like overkill at the time. Early in the project, complexity was low and the migration would have been quick. Trying to do it now feels daunting.

I picked Entgo as the Golang ORM mostly out of habit, but a lot of its advantages get neutralized when you have a clean architecture layer with clear boundaries. (More on this in the tech retro.)

Honestly, even if I'd been off the keyboard for a while, I made a fair number of calls that suggest my instincts had dulled, and I'm getting cheerfully beaten up for them at release time. It hurts.

Routine Issues

As I mentioned in the daily-life section, the routine I'd tried to hold broke down, and at one point both of us hit a wall, physically and mentally. As an emergency move we picked a workation, which turned out to be the right call. We came back somewhat reset, but I haven't yet returned to the discipline I had at the start, and I need to reset again. Late in the project, the daily Scrumble entries and weekly retro reports were 80% complaints about how hard it was to hold the routine.

The courage to start again is harder than the courage to start the first time. Model life, take two. Let's begin.

Try

Communication

A lot of these problems actually got solved when Ellie and I went for a walk and talked them through. So regular check-ins matter, and so does just talking more, period. Ellie's reshaping the space we work in (our living room) to be more conducive to talking, and I'm setting up daily and weekly meetings so the next project never gets the note "communication was lacking." Better too much than too little.

If You're Not Embarrassed, You Shipped Too Late

If you are not embarrassed by the first version of your product, you've launched too late. Reid Hoffman (LinkedIn co-founder)

At some point, trying not to be embarrassed, I drifted off the original MVP and started arguing for an over-spec'd technical scope. It's a thing I'm building for myself. I can keep refining it after it ships. In one of our retros Ellie said something that mattered: We were burning time trying to design for an unknown audience, and the moment I started thinking of "Tony as customer #1," the work moved fast.

So we decided to release at this level and wrap up, to clear the deck for serious business prep and the next project. The next one will wrap much faster.

Immersion

A routine is one tool for immersion. But what I mean here isn't only about working hours. There are very few times in life when time itself has been this abundant for me, and now that it is, I'm a little careless with it. By careless I mean not fully present, not fully immersed in the hour I'm in. If I keep wasting days like that, all that's left is regret. Every day, I try to snap back. When I'm playing, I want to be fully in the play. I don't want most of the day spent doing other work (company work) while my mind chases happiness elsewhere. I want to be in the present hour the way I originally meant to be. It's a skill that needs practice. I'll work harder at it. Work when you work, rest when you rest. That gets especially hard in this kind of setup, where I work at home and time is mine.

Lessons Learned

A perfect MVP is a contradiction. Ship early, even if it's embarrassing.
Duo project = over-communicate. Even ten hours together, talking is non-negotiable.
If it breaks, try again. A broken routine isn't a reason to quit; just stand back up.
AI is not magic. In the end, a human has to understand the code.
Dog fooding is the answer. Be customer #1 yourself.

Closing

It was a side project, but full-time. Some days I sat for 20 hours straight coding. Some days I didn't touch the keyboard for three days running. For years I'd spent every summer commuting in early to a building, working through to a late commute home, and this summer (already an unusually hot one) I worked from home and felt the full weight of it.

I'd wanted to build Scrumble for years at the last company, and I love that I'm using it every single day now. We're a long way from the feature milestone we set, but I'll keep updating it gradually as we move on to other product work.

Three months in, the truth is that committing my own hours to building made all my bad habits and weak spots visible. I could see clearly how much, at the last company, I'd been borrowing other people's hands to solve things. The flip side: I've been thrown into an environment where I have to handle all of it myself, which is also where the fastest growth happens. The company was comfortable, but growth needs the courage to face everything yourself.

There are good days and bad days, and one regret is that I kept putting off writing or organizing notes until I was getting around to it like this. I logged the project daily, but I wonder if writing even a regular journal would have helped. Record more often. Look back more often.

I started this side project while preparing to start a company, and at some point it became the main thing. Hands loose, warm-up done. Time to apply what this project taught me to the next one, and not repeat the same mistakes.

Process matters and outcomes matter, but if you don't actually finish, none of it adds up to value. Beyond the action items I wrote in the Try section, I'm going to keep facing myself head-on. And I'll work harder so I don't lose the joy.

"A warrior does not give up what he loves, he finds the love in what he does." "A warrior is not about perfection or victory or invulnerability. He's about absolute vulnerability."

Peaceful Warrior

(My desk during the workation)

(A check-out note inside Scrumble during the workation)

To be continued in the tech retro

Previous retro
- Four-year IHFB retrospective
Next retro
- Scrumble Tech Retrospective (subtitle: the failure of full-stack development and vibe coding. Golang and Next.js): 0. Opening

Four Years at IHFB: A Retrospective

Tony Cho (https://flowkater.io) — Thu, 10 Apr 2025 00:00:00 GMT

Opening

I served as CTO from April 2021 to April 2025 — joining a three-engineer team, growing it into a 30-plus-person R&D division, and finally closing out a long campaign. I'm not sure my writing is up to the task of fitting four years into one post, but I want to get it down before too much time passes.

Timeline

April–May 2021. Starting Part-Time

I joined on a short-term, part-time contract. I was building my own service at the time, and to keep cash flow going I spent my days on my own business, then went into the company at night. The engineering team had three engineers total, and two of them and I worked together every night from 7pm to 11pm, sometimes past midnight.

I worked on a few things at the time. The first was a single controller file with over 7,000 lines of code. I tried to clean it up by applying a layered architecture (at least), cutting repeated code, separating raw queries into a repository pattern, removing recursive functions, and applying a functional style. It was a Node.js / Express.js setup, a node server with no framework, running a single instance with PM2. Inside the controller, instead of calling other functions, the code was hitting its own server's API endpoints, which during peak hours caused repeated network overhead and response delays. Most of my time went into fixing that.

Push notifications and other heavy work weren't being moved to background jobs either, which was hitting the server hard, so another big task was pulling that into Kafka and handling it as events. Most of the work in this period was on the legacy service, especially the backend: fixing problems, refactoring, performance, server stability. Alongside that I was studying Kafka, applying it to events, studying k8s.

June–December 2021. Six-Month Contract CTO

While I was doing all this, the engineering team's main projects somehow became directly tied to me, and in May they offered me the CTO role. The previous CTO wanted to move to the service experience innovation team (an operations group), and since I needed cash anyway, I took the offer without much hesitation, thinking six more months, why not?

From June, while continuing to work on the legacy service, the CEO proposed throwing out the legacy operating service entirely and replacing it with a brand new one. Internally we called the project SA service development. Replacing a service from a Series-C company with over 100 people felt like a tall order, honestly. I initially proposed improving the existing legacy as much as possible, but after firm persuasion from the CEO, we committed to building the new SA service from scratch.

Our stack was Node.js-only at the time, so the blueprint was straightforward: TypeScript + NestJS, MySQL, deployed on k8s. For the first three months and change, almost all of the time went into thinking through what was wrong with the legacy and how to design a system extensible enough to hold whatever content we'd want to put in later. We prototyped fast.

In parallel I was working on the CMS, the content editor. The legacy service offered only a bare-bones editor, and the content team had been storing things like multiple-choice option numbers and special characters in macros or notepads on the side. They'd even elevated buggy behavior into what we treated as a workflow feature, as part of how they got things done. I called these people content ninjas.

To solve the content production problem, I started planning and building a new CMS. This was probably the first time at IHFB that we actually listened to users, did the content production work ourselves, empathized with their problems, and kept the customer in the room. I remember testing every single keyboard shortcut, and running rough user tests with intermediate builds as we went. Content is the heart of an education business, but the perception was just hire more people, so this team had been getting zero technical support. I had a strong urge to fix it for them.

Anyway, during this period we were also hiring aggressively to build a real engineering team. The company had successfully closed Series C and was deploying that capital into all the scale-up moves: a record-setting TV ad campaign, large-scale hiring. By December 2021 the engineering team had grown from 3 in April to 13. And my contract was nearing its end.

2022. Toward the SA Service Launch

I'd started it, so I wanted to finish it. We'd come this far, so I wanted to see it through. I signed two more contracts in 2022 and started running toward the SA service launch. The company moved from Mapo to Yeouido Park One (Seoul districts). We hired aggressively across the board, not just our team. The actor Lee Byung-hun was on TV promoting the service, and marketing was firing on all cylinders.

Everything that ran on the legacy had to keep running, and structurally the new service had to be better. The legacy service had years of features layered on top of one another, and somehow each of them was being used somewhere. Missing even the smallest one would break the service, so as the dev timeline kept stretching, we kept checking and re-checking that we hadn't missed anything from the legacy.

The first cut of the service was done in August. We pre-launched to a low-impact, low-sensitivity user group (1st-year middle schoolers and 3rd-year high schoolers, who don't take in-school exams). Through September, October, November we kept it in pre-launch. In the last week of December all users were migrated to the new service. January through July, a full seven months of dev. August through December, all the non-dev work (operations team training, content migration) to actually put it into production.

We kept hiring, and the team passed twenty people. Around summer, what people call "Naver/Kakao/Coupang/Line/Baemin"-tier team leads (the Korean big-tech equivalents) joined, and the team finally felt like a real team as we pushed through dev and launch. For the final content migration we were at the office until 4am with the team. I can still vividly remember the hours leading up to that night. The full release took over a year, and honestly there were moments I wondered if it would ever actually end.

By that point we weren't just an engineering team anymore. We were the R&D Division, with our own organizational culture, moving as one team. Team leads had come in. The service had launched after all the back-and-forth. The wrap-up moment was finally close.

January–August 2023. Going Full-Time, Then Iteration

Ironically, when we finally launched at the end of 2022, the company offered me a full-time role and gave me a short window to think it over. We'd launched, sure, but there was still a mountain of work: stabilizing the service, dealing with the issues that come up in real operations. After putting more than a year into the service, though, I wanted to stay and watch it find its feet and grow.

So in this period I bought a small amount of stock and became a shareholder, and for the first time I started working at the company full-time. I changed my mindset too. Up until then I'd always felt like an outsider, and I'd been working from that posture. Now I was going to commit a little more sincerely. No more excuses. Do my best with what I can do. That was the resolution.

The new build was good, but compared to before, both the architecture and infrastructure had gotten more complex, and incident response and various solutions were genuinely hard. The three-team-lead structure that came together around then (backend, frontend, PM) controlled the feature teams well, and despite all the issues, we were building out work processes in an almost ideal way. QA was set up around then too. By that point we were already over 25 people.

We launched a more premium product on top of the existing one, which came with a lot of additional feature requirements, and the product itself started handling many more content types and packing in many more features.

September–December 2023. B2G? B2B?

Things started getting hard around then. This was probably when I entered a completely new phase. The PM team lead I had real chemistry with left to start her own business, and I took over as acting PM lead. From that point, I was doing more PM work than CTO work. The company was also expanding from B2C (which it had focused on) into B2B and B2G, building on the platform we'd made, and I was barely keeping up with the business requirements.

In September, then again in January 2024, the company planned and ran large external events (expos) it had never done before. To prep demos and features for those, this stretch was almost entirely B2B and B2G product dev rather than B2C. B2B/B2G is genuinely much harder than B2C. I couldn't tell what to base feature decisions on, what to actually build, and instead of customer voices, feature priorities kept getting shoved around by self-proclaimed experts: internal executives, outside expert groups, departments outside the engineering team.

This was when things got truly hard. Not because the difficulty was high, but because B2B/B2G product-building isn't a domain I'd built up over ten-plus years, so I honestly felt no joy or interest in it. Every feature implementation felt like homework. Having said in 2023 that I was going to commit fully, though, I kept moving forward.

January–August 2024. B2G

I never thought I'd be making a textbook in this lifetime, but the policy review guidelines for the government textbook certification were handed down, and from the January expo to the August submission deadline I burned everything I had. By the end I was barely going home, working weekends and holidays. It wasn't only dev. There were training programs, demos, every kind of business requirement coming in along the way, on schedules tighter than what we'd planned. I just kept rebalancing priorities and pulling resources together to get it done.

I was running more like a PMO function or PM than a CTO, and the PMs working with me at the time recognized me as a PM. I learned the hard way that PM isn't a fit for me. Engineer ego aside, I gave everything to finishing the work. We took the existing service, kept only the skeleton, and built a new platform in a short window. The summer passed without me noticing the heat.

And Up to Now

From September 2024 to now, I've covered enough in earlier writing. If you're curious, see below.

The Last Seven Months: A Stream of Thoughts (records from September 2024 to March 2025)

To Sum Up

Early days: legacy platform refactoring and stabilization (3 → 13 people)
Phase 1: launching the new SA platform (13 → 20 people)
Phase 2: platform iteration (~25 people)
Phase 3: building the B2G platform (~35 people)

If I had to pick the most fun stretch, it was from February 2023 (when the three-team-lead structure clicked into place) through June 2023, when we were just focused on iterating the service. Funny — writing it all out, it doesn't feel like much got done. And when I really chew over those years, no period was easy. The time when I was still writing code was honestly more fun, and people were closer.

As someone else once said, those memories belong only to me. Everyone else has already forgotten, and only the recent memories stick with them. That long period when we were sometimes happy together and sometimes really tight as people, there's no way to find traces of it in each other anymore. That's sad. I'm not saying we should be chained to the past. As someone wrapping up and leaving, I just want to be remembered well. What can you do? Coming this far was my decision. I could have stopped at any of those moments along the way, but I kept going. That was my choice. If I'd wanted to walk out to applause, I shouldn't have come here in the first place. Anyway, I've laid out what I did in rough chronological order, just as it came to me. Now let's go through it piece by piece.

Outcome (Org and Product)

Data Team

The data team was properly built when the two people currently there joined in August 2022. We started from basic data engineering with Airflow, did a stats study group together (we even rewrote Andy Field's whole book in Python), built out a data warehouse, set up analytics pipelines, and worked together on the data products that became core to running the service.

Both were juniors when they joined, so for the first year we ran weekly study sessions and did design work together. They both put in their personal time too, and partly because of that, they grew really fast. One of them finished grad school over the past two years on top of the busy work schedule. People joke that she's a robot, but as someone who's actually tried doing school and work in parallel, I have nothing but respect for her.

I could have done more focused work on the data team, but unfortunately, from the second half of 2023 when I took over as acting design team lead, I ended up neglecting the team more than I should have. Coming from a data engineering role at my previous company, there's regret that I didn't execute more of the roadmap I'd planned. The saving grace is that both of them are people who run themselves well, so at this wrap-up moment they're the team members I trust most and I'm most grateful to. They're also (from my side) the easiest people to be around, and the team where I most wish I'd looked after them more. They're working on the company's main initiative right now, the AI Agent project. They've finally landed on the something AI engineering they'd been wanting, and I really hope they get the result.

Design Team

When I took over the design team, I thought it was a blessing in disguise at the time. The previous lead had left, and while we were hiring a new PM, I stepped in as acting lead. It pulled me back into the planning details. The one thing I regret is not making enough of the opportunity, because we hired a new PM not long after. The new PM was a junior, so I stayed on as acting lead, and the biggest mistake was demanding from this new PM most of what the previous PM lead had been doing. (PM stuff, I'll save for the PM section.)

The reason an engineer-background lead could land any real hits in the design team was that the planning skills I'd picked up from running my own business held it up. We were also pursuing product design (owning policy details all the way down, not just UX/UI), and that gave me room to play several roles. On the aesthetic and form side, my feedback was close to zero. The best I could do was feeling out the room with I don't really like this color. I can't make it, but I can tell good-looking from ugly.

What I liked about leading the design team was the time spent one-on-one with each person, going back and forth on product direction and feedback. Designers and developers are makers, and makers can't help over-investing in their own solution. Someone has to keep pulling them out of it from the outside, and when leading the project, I think I did that part well. Especially when many initiatives were running in parallel, the product was still one and the customers were still defined, so giving useful feedback wasn't hard, and the designers took it in well. When something didn't make sense, we kept trying to convince each other. You could say that's just how you see it, but when requirements come down so top-down, my position as a middle manager forced me to keep asking why and to keep building motivation just to set priorities. The persuasion may not have been enough for some people, but I can promise I never skipped the persuasion process.

That said, recently (at the end), with so many products, so many squads, so many kinds of customers, the useful range of my feedback shrank to mostly implementation-scope. If I hadn't quit this year and had kept going, my plan from the start of the year was to delegate the design lead role and return to being CTO.

QA Team

We set up the initial QA staffing with the original PM lead, and built out the QA process together. The QA team, truth is, didn't grow much after the initial setup. This is the team I have the biggest regret about. While also leading the data team and the design team, I didn't have the experience or the capacity to give QA proper direction and lead it as a specialized function. Late in the period, I tried giving the QA team more authority and a sense of ownership to push them to do better, but unfortunately, during the time I was there, we put almost no time into proper test-case reviews or building QA automation. My experience there is limited, but the team has people who've worked with strong QA orgs. I hope the team itself can push toward becoming a more specialized function that contributes more to the org going forward.

Engineering Team (FE/BE)

For the first year I was almost always with the engineering team, looking at code, writing code myself, spending most of my time on code review. I delegated FE to the FE lead who was there from the start, and concentrated mostly on backend. Legacy, layered, functional, DDD, clean: I came in with all these design agendas. The one thing I regret from that period is that no design methodology is 100%, and I set the criteria too vaguely for the gray areas. We just kept developing. A lot of that is still in production code. Even if my methodology becomes a new kind of legacy by today's standards, I should have set stricter criteria and done more rigorous code review on the ambiguous parts. I was lacking a lot back then.

When the BE lead came in, I naturally delegated code ownership. For about half a year I spent more time pairing with a junior FE engineer once a week to untangle a heavily coupled component. Most of my actual coding was on the data engineering side. Since this was the company's core engineering team, I always followed up on key design and decisions and actively joined design reviews on new projects. From that point on, instead of code reviews, it was mostly design reviews and tech doc reviews. From this distance, I should have stayed on top of code reviews more consistently. I wasn't completely absent from the code, but it was nowhere near enough. Whether the implementation actually matched the design has to be verified through the code itself. Because I was so weighted toward design reviews early in projects, I kept finding out late that the agendas I'd brought in originally and the actual code implementation had drifted apart in important ways.

Even as I'm wrapping up now, I keep asking myself if I could go back to point X in time, what would I pick differently? From tech stack to design direction. The biggest one is that during the B2G phase, when there was a window to partially rebuild the platform, I didn't go far enough in breaking the existing structure. I compromised to hit a tight schedule, what amounted to honoring the old while building the new, and looking back from here, that was the wrong call. I really hope my successor and the engineers staying behind, plus the strong people coming in, can fix the structural problems at the root and turn it into software that's resilient to change, a service where you can focus on the core logic.

PM

Honestly, just regret. In 2024 I was a PM, not a CTO, and as a PM I failed badly. I didn't make use of the strengths and weaknesses of the PMs we'd hired. I didn't delegate properly. We didn't communicate or share well. The team and people I spent the most time with. And yet I just couldn't pull it off. I owe the PMs I worked with an apology, and that failure left me unsettled for a while.

There were a few causes. The biggest one is that I was the PM lead and I was the PM role model. But no matter how much I set down the dev work to be a PM, the ability to review and plan a product based on technical understanding and design is mine alone. That's something I do well. The big mistake was forcing that role on others early on. The thing is, the moment a PM hears a requirement, they have to step back and check their own thinking: can I make this call or not? If they can, decide fast. If they can't, get the right stakeholders into a room fast so the work doesn't stall. The moment a PM thinks they can decide something they can't and burns time on it, the timing slips and everyone working on it ends up unhappy. And in an in-house product, the PM's job is to deliver the customer's voice faster, judge the essence of the problem, and work with the makers to land on customer satisfaction. The timing was bad too: the B2G product I worked on for a year was a project where you basically couldn't do any of that. The role was mostly take a given task and shrink its scope to the bare minimum that still meets the conditions. Or write business proposal documents.

Still, I think I could have done better. And in spite of everything, they should have been working in a better product environment. At least when it comes to PM, I'm not A-tier, so if I'd met these people in a B2C or B2B in-house product, I might have done better, might have delegated more. In this kind of environment, PMs honestly just get pushed around and bled dry.

I never praised that team properly, only nagged them. I'm sorry. I hope both of you find environments where you can work more like yourselves.

Org Structure Changes

We started with feature teams, and ended up (clumsily) with mission-based teams, a squad system. I think that's a good thing. I didn't drive the change proactively, though. Buffeted by changing external business needs, our single big division kept doing wildly different work in feature-team units sprint to sprint, and to get some continuity and focus people, we ended up in what I'd call survival-mode squads. But there weren't enough PMs, the size of the businesses was wildly imbalanced, the contexts were different (different businesses, different customers), yet we kept the same platform strategy (same codebase, same branch). Every release blew up. To fix it, you'd have to either drop a business, or if you can't, split codebases by business unit and make everything independent so the squads can move faster. On this one, my view differs from the executive team, so I won't comment further.

If we'd separated codebases per squad, put a senior engineer on each squad, had a dedicated PM per squad (especially for B2G, where stakeholders are so tangled, I think you need at least two PMs), and split the org structure clean along squad lines, could we have run a more agile system regardless of headcount? Saying we'd have done well if we had more resources we didn't have is a cheap thing to put in a retro. What I really wanted from the squad model was back to the basics. The way I built culture and led nimbly with three engineers when I first joined, I wanted to recreate that in small squads. But honestly, looking at where things stand, I can't say that worked out. Was it because we'd gotten too used to feature teams over the years? Or just because last year's B2G project drained everyone?

Still, the one really important thing I should have done is: more 1-on-1s with people, more often. When I scrambled to talk to people in January, the timing already felt too late. If anyone asks me, I'm a manager and I don't know what to do, I'll tell them: do 1-on-1s diligently with your people. That's the best move.

Also, the absence of a clear reporting structure (both up and down) was one of the biggest problems. Setting aside the squad structure, I should have set up direct, organized reporting from team leads, PMs, and seniors much earlier. In startup organizations where everyone is moving on their own, I'd treated reporting as almost a forbidden word. But in a bigger company, reporting structures and rank structures matter, and so does delegating well to the people you can delegate to. Coming to that realization too late was another problem.

Leadership goes downward, but it also goes upward. Just changing org structure doesn't make work happen. Org structure functions properly only when the upstream business structure and the downstream team culture both back it up.

Org Culture

Up until the org grew toward the first 20 people was probably when my influence was strongest. Check-in meetings, random lunch-mate matchups, sprint retros, those mechanisms worked. I wrote the onboarding doc myself. The two things I value most in org culture (humor and curiosity about each other) were running through small mechanisms that actually worked.

Then last year, with the B2G project, people lost motivation, the company stopped communicating, and it all collapsed. As I started letting the small details go one by one, even the mechanisms that had been working just fell apart at the end. I decided we couldn't keep going like this and brought in the company's EX (employee experience) team. Now that the company has to move forward again, this might be the moment to reset the team's identity and culture entirely. If the culture I built doesn't work anymore, I hope they peel it off without hesitation and set up something new.

People Ops

Because the company's HR and people ops were mostly absorbed in the education and teacher operations side, almost all org-level people ops landed on me. The thing that took the most energy: the comp negotiation season at the start of every year. When I was alone, I had to evaluate all team members myself. When team leads were in place, I delegated to them but did the final evaluation myself. We brought in peer review for people who'd worked together, and used it more for reference than as an evaluation. A month on team evaluations alone, almost two months on comp negotiations with the executive team. For 2023 and 2024, the first quarter of the year was almost entirely poured into people ops. The executive team was reluctant to raise compensation much, and team members wanted bigger raises, so I had to do max-tension tug-of-war within the cap I was given. Negotiating someone else's salary, not your own, isn't an easy process. To make evaluations more transparent, I introduced 1-on-1s, quarterly retros, peer reviews, all of it.

The flip side: with that much authority comes that much responsibility, and since every execution went through my hands, the moment I got busy or lost focus, I'd drop something. There was no framework or system to back it up, and the org had no real HR function, so I spent a lot of time fighting alone.

Looking back, when the team was under 10, I had enough capacity even without HR support to give frequent feedback and run growth-oriented conversations. In 1-on-1s we'd set goals together for the next month or three months, look back at what got better and what was missing, and during that time at least, I gave honest feedback (positive or negative, no holding back). I made a point of preparing at least one piece of negative feedback per person, where negative meant something they could turn into an action item.

As the team got bigger, the system was still nothing, and the company had less slack, it got harder to keep evaluations and reviews consistent. Building a system was a stretch when the company still had so much growing to do, and without a system you can't promise anyone even a baseline of growth. Pay eventually fades into background noise once you've been receiving it for a while. There's a limit to how far you can convince or commit someone with money. It's just one of many ways to express what you think they're worth. What matters is whether the leader pulls growth out of people with sincerity while still giving objective evaluations. And on that front, I'd think I was doing fine, then think it was impossibly hard. It was a brutal stretch.

Team members and the company both talk about money in comp conversations, and only money. The company says we paid this much, so deliver this much. The team members say if you want this much, pay this much. A team member who works at a 6 out of 10 doesn't suddenly become an 8 because you raise their comp by 10 million won. (Truth is, you'd be lucky if they don't drop below 6.) The company shouldn't expect that, and the team member's promise of it isn't worth much. Comp negotiation is one lever for carrying value and growth forward. Trying to solve everything with it, without context, is just not knowing people.

I always emphasized individual growth, but in the end, you can only talk about individual growth if you keep pulling team growth, and beyond that, team wins. In a startup, team wins have to connect to the company's business goals. The team has to drive those goals and turn them into wins. In an org where you don't even know what winning is, no matter how much a leader talks about individual growth, it's hollow. Individual growth sits on top of team wins. It might be sacrificed for one. But it can't exist without one.

The executive team wanted us to build the product they had in mind, and they ran the business and engineering as separate orgs. Every metric and every requirement's backstory came only through the executive team. Right or wrong, that's a structure where it's really hard to define what team winning means. The hardest moment was when the B2G project finished and the team had been crunching for three or four months. We hit our final goal (passing the review), but that wasn't a win. Not users, not revenue. Just earning the right to sell. With no business outcome reached, the team had worked plenty hard, and wanting some additional reward was completely reasonable.

Of course there was no extra reward. From that point on, the company started struggling, and even the things you'd taken for granted started disappearing. We spent enormous amounts on B2G with no clear business goal definition and no return. Forget headcount cost and time, we did everything money could do. But the executive team didn't explain (and when I asked for an explanation, all I got was we have to save money now). I had to manufacture a company message in the middle, talk about vision... but I wasn't the final decision-maker, so my words couldn't carry weight. I myself was deeply skeptical about doing business through B2G product. It was because the company, in the end, didn't fully respect the CTO. Whatever the intent or reason.

The story of finding out, the day before the holidays, that the Chuseok and Lunar New Year gifts that always go out hadn't gone out (finding out as Division Head by asking the other directors), and then scrambling to buy the team's holiday gifts on my own dime: you couldn't make it up. I'm not bragging about paying out of pocket. With the judgment I have now, I'd never use my own money. But moments like that pushed my leadership in running a 30+ person division beyond its natural range, and probably drained a lot of my patience.

Even so, thanks to all this, I got to experience how to run an org, manage it, hire, all of it. These aren't lessons I read in a book — they were carved into me the hard way, and they'll stay. I won't be running a big team for a while, but I'm not giving up on thinking about it. Because I really do love working as a team.

Product

When we built product, I always planned around structure, design, solutions. Sometimes thinking that way helped (extensibility, structural thinking), but the thing I regret most is the actual ways people would use this, which I didn't see.

A concrete example: the curriculum part of the current content system was designed on the assumption that a curriculum gets written after it's complete. But in the actual service, most curricula are written and updated as they go, and that's exactly the case customers are paying for. In the first year I was so focused on catching up to the legacy and on my (self-proclaimed) structural completeness that, no matter how much extensibility I thought about, we ended up with a structure that's painfully fragile to change. Because the product design itself was broken from the start, the more use cases and edge cases surfaced, the more dev cost grew exponentially.

It's like building getters and setters in an OOP class, thinking now we can generalize these everywhere, then watching feature requests pile up and the getters and setters get crusted with if statements, and one wrong touch sends side effects into every other area. Unlike DDD or recent design trends, I held the foundational model design too strict, and that fixed structure became a kind of immutable law. After that, every change had to either avoid touching it or work through it, which inevitably meant ballooning dev cost.

You were CTO, couldn't you have just done better? Ha, fair. That's why I wrote I regret this. Early on I should have interviewed the operating teams much more deeply about exactly what they did and how they ran the service, even done the actual work myself, and understood it. But because I had to focus on the new platform that would replace the legacy, and because I was stuck in what I'd convinced myself (that the new platform is obviously better in features and design), I developed in isolation, without looking deeply at the use cases on the customer side. One reason I didn't quit after the 2022 launch was that I had a strong urge to make up for this, to develop from inside the customer use case.

Honestly, more than the question of which dev stack to pick, this is the part I regret most all the way to the end. For a developer to write good code, they have to know the customer. If you write code based only on what the PM or the designer or someone else tells you, before long you're looking at your own code and blaming yourself for it. If it's just a code issue, you can refactor. But once the DB schema, the production data already sitting there, the relational table structures are in place, turning all that around, migrating it, redeveloping, once the service is in motion and chasing the next business requirement, that's almost impossible. You're carrying a giant lump forever. The trap engineers with tech, tech, tech ego need to avoid: our value only shines when it connects to delivering value for the customer. If I'm getting paid and what I'm building doesn't connect to customer value, I think honestly I'm a developer on borrowed time.

Which brings me back to the same point: the customer. In the end, we're building products people use. Don't forget that. Don't ever forget it.

Closing

What I built, in the end, was an org and a product. If I had to pick a single action item from the org side, it'd be: do more 1-on-1s. The things I most regret, the things that stung most in the end, were the if I had just done it back then, it might have been different kind.

On the product side, like I just said: put the customer first. Forget structure, forget design. What actually matters? The customer has to come first. Don't get confused. I'm absolutely not saying just take what customers say at face value. The real skill is catching the requirements, needs, and opportunities hidden underneath. (And being the kind of developer who can carry that into code.)

Like my other posts, this is a brief (?) look back at four years of work. I wrote (or rather, dumped onto the page) whatever came to mind, so it's all over the place. Really, a stream of thought transferred straight to the page. While writing it, part of me wanted to clear it all out for a fresh start, and another part of me wanted to bury the past here and not be tied to it. The memories will fade. I'll read this and think, ah, right, that's how it was. There's plenty I'm still unhappy about (the company, other things), but this is a public space, so I focused on my own record and on things I can actually act on. Those frustrations will fade too. Some other version of honest thoughts I might write down, well past now, if I still have the will and the memory.

Most of my twenties I spent building my own business, so for a 13-year career, I've never been at a company this long as a regular employee. And not just an employee, a middle manager with the title of CTO. I spent my twenties and into my thirties being called CEO, and only lately have I gotten used to being called Director (a middle-management rank below the CEO title I'd held until then). Time scares me a little.

Watching the company grow rapidly and watching it struggle, I always ran a mental simulation: if I were the operator, what call would I make here? Not I'm better than anyone, but I have my own style, so what's the best version of my approach? And I tried to understand the operator's position from the inside while doing it. Going from 3 to 30 in R&D while the company went from 100 to 500 was a high-difficulty problem.

I trash-talked my boss behind their back and ate good food on the company card. I learned what a paycheck means. I learned a lot of things I didn't know when I was only an operator. I became more of a realist. As a middle manager I tried to live up to the responsibility, but it was harder than I expected. To my team I held two opposing sentences in my chest, I'm in the same boat as you and I'll take responsibility and decide, and there were moments I disliked everyone, above and below, and moments I felt nothing but gratitude for everyone. Right here at the very end, what's biggest is regret. But over this time I felt and learned and gained a lot.

At some point work got so heavy and the stress got so bad (I can't remember what it was about) that I sobbed in my wife's arms (we weren't married yet). At some point I had insomnia (I'm someone who falls asleep the moment I lie down, anywhere) and was swearing in my sleep, and there were nights I'd wake up at 3am over work and stay up until morning. I was more stressed than when I ran my own business. I only let it show to people at the very end. For someone who likes to wear his heart on his sleeve, I held a poker face and made it this far. Think too deeply and everything ends up a regret.

Now I'll bury these four intense years here, hold onto the good memories and the lessons, and start something new.

What we call the beginning is often the end. And to make an end is to make a beginning. The end is where we start from.

― T.S. Eliot

To everyone who stayed with me to the last line, thank you for reading all the way through these messy, scrambled four years. I'll rest for a while and come back with new stories.

The Last Seven Months: Scattered Thoughts (September 2024 to March 2025)

Tony Cho (https://flowkater.io) — Tue, 01 Apr 2025 00:00:00 GMT

Opening

I haven't written a single retrospective since August. Between September and March, more than half a year slipped past with my body and mind both out of shape. No goals, no retros, and I can say with some confidence it was the worst stretch I've had in years. The health I'd built up over the summer fell apart, and my head wasn't in a great place either. Now that I've finally decided to reset the problems that were chewing away at my life, I figured it was time to write something about what I've been through. Calling it a retrospective feels like a stretch, and it doesn't live up to the monthly records I used to keep, but compared to an annual review the cadence is still short, so I'll take that as consolation and just set down whatever comes out, in no particular order.

Married Life

September was when the B2G project at work finally wrapped up for the season. Looking back, I worked through the summer without even noticing it was hot. At the time all I wanted was a short break to enjoy the relief of being done. Over Chuseok, my wife and I got what passed for a blessing from both sets of parents and flew off to Switzerland and France, hoping the trip would repair the long stretch I'd burned through at the company. But by then we'd already been newlyweds for the first half of 2024 (half a year during which work was pretty much all my life was), and each of us had been holding on in our own separate ways. That time wasn't something a single European trip could undo. Truthfully, that was the moment I should have come to my senses, but back then I was just full of resentment. I think the only thing running in my head was I was just trying my best, and look where it got me. In the end, recovering took another six full months. The saving grace is that we're at least moving in the direction of recovery, and that's entirely thanks to my wife's resolve, her patience, and her effort. The process was brutal for both of us, of course.

Work

No matter how much meaning I tried to layer onto it, the B2G project was, to me, just outsourced work. The government had put out a pass-or-fail guideline, and that guideline was vague enough that the whole game became: interpret it however you can and squeeze out extra points. There was no customer in the room, no end user. Just the government trying to ram a half-baked policy through in a hurry, and the incumbents in the market angling to clear the bar. For us to play the game at all, we had no choice but to partner with those incumbents, and we finished the product through a tug-of-war with fuzzy guidelines and requirements from partners that made no sense. The real push started in May 2024, but we'd been pouring resources into it since September 2023. As of April 2025, the team and I had spent nearly a year and a half on it.

"Startups shouldn't do outsourced work"? I'm not trying to chase that kind of idealism. Business is business. Startup or not, it doesn't matter. When your in-house product is going through a slump, turning to outsourced work is the natural move for a startup that's running out of cash. Years ago, at a dinner with Song Jae-kyung, then CEO of XL Games (I think he's since left), he told stories about the early days at Nexon building The Kingdom of the Winds, and he mentioned that everyone on the team except him was doing homepage contracting on the side.

Anyway, I'm drifting. The point is that to move forward, sometimes you have to do work you don't want to do. That's what running a business is. But that's not what I'm getting at either. If this had been straight contract work, we'd have gotten a down payment when the project kicked off and a final payment when it wrapped, as fast as possible. If it had been in-house product work, we could've given up short-term revenue and aimed for something bigger. But when a project is really just a handful of stakeholders' requirements dressed up as something more, the right move for a startup with its own product is to knock it out fast, collect, and get out.

We spent a year and a half on it. Because it was B2G, it was meaningfully harder than ordinary outsourcing, but we met every requirement and delivered a good result. No down payment. No final payment. Passing the B2G review just gave us the right to enter the market and try to sell the product. The thing is, the pass criteria didn't reflect what the market actually wanted. To actually sell, we'd have a mountain of real development still ahead of us. And to make matters worse, the December 2024 South Korean martial-law declaration kicked in, and public sentiment around policy, already shaky, got even worse. The sliver of return we'd been hoping for in 2025 vanished entirely.

The thing about startups is that you can't really run a Plan B. You stare at Plan A and throw yourself at it like a lunatic, and when the slim odds of success do arrive, you're the one who can move faster and bigger than anyone else. So it's very easy, after the fact, to talk about these kinds of outcomes with hindsight. In the moment, you can't know anything. When a crisis comes, you cut and cut and hold on until the next chance shows up. From a startup point of view, that call is completely natural.

But the people going through it with you have a rough time. Decisions you believed were the best turn out wrong, and wrong decisions hit every part of an employee's life, financially and mentally and everything else. If you don't want to go through that process with the company, leaving quickly is the right answer. No one will blame you for it. No one has the standing to. That said, since these aren't collective decisions, the company has to clearly share how things got here and where things stand right now. That's what communication means.

And honestly, from where management sits, that's about all you can do. You give the information, and then you give people the room to make their own calls. Stay or go. That's the best thing for everyone. The ones leaving get well-wishes for the road ahead, and the ones staying lock in together and try to turn the crisis into something. The company has to be able to ask for that.

But the absence of communication strips all of that away. From both sides. The hardest spot of all is the middle manager with no information, no authority, and no answers for a team that's waiting for one. Powerless. Useless. You try to hold yourself together and do what you can, but eventually you break. What more could I have done? Honestly, I don't know.

Back then, I hated it all: the teammates who didn't know what I was dealing with, and the executives too. The moment I caught wind of things said behind my back, in their own little circles, something in me bruised. There was nothing I could do about any of it, and maybe the comments weren't even aimed at me, but they all sounded like accusations. I was treating the company's problems and the company's responsibilities as if they were mine to absorb. I think that's what being a leader is, though. Whatever the situation, you stay on the hook until the end.

But after long enough with no information and no authority, I hit the wall. Every business decision was being made upstream, we were getting dragged along, building only what was handed to us. The same pattern, again. There was no "here's how we're going to improve things," no "let's communicate better." Just the same old way, heading somewhere I couldn't see, forward for the sake of forward.

At first I thought maybe it was just my department, that I wasn't doing a good enough job and that's why we were the isolated island. But when I looked around, every department was an isolated island. If anything, the information I had was coming in a little faster than elsewhere. The moment I realized this wasn't going to improve, and the moment I realized the people with me had stopped expecting anything from the company (and from me, by then), I decided to stop.

Month-by-Month Work Summary

September
- Project submitted. At that point the company was saying we'd refocus on the in-house product, so I started setting up an objective-based squad structure and cleared the B2G product off my plate, tired of it as I was.
October
- We were suddenly told a global-business project was starting, but I tuned it out and focused on the B2G product's post-review corrections. I think I just wanted it to be over.
November
- The B2G product passed review. Since scoring was per content item rather than platform-level, some items failed, but more than half passed.
- Around this time we reorganized quickly into an in-house product squad and began planning and building new tasks.
- The period wasn't easy, but because we were centered on in-house product work, it was at least fun.
December
- Back to the B2G product. We were told in November there'd be a resubmission schedule for the items that failed, and there were new partners involved. Every time it's the same story: we're told there won't be any additional development, and every time, once you open it up, the spec dump is just as heavy as the previous half-year's. In the end, as the first round of in-house squad work wrapped, I had to pull the B2G product back in.
January
- The squads were nominally set up around the in-house product, but we were clearly going to have to do the B2G work no matter what, and that needed a plan. Running retros with the team leads, I realized how much we'd been letting slide under the cover of "we're busy." We reorganized the squads by business line (global, B2G, new in-house) and gave team members formal titles. I think I was trying to carry the thing on personal effort, whatever the company was doing.
February
- I thought the new squad structure would let us move sharply, but in the end none of it was really clicking. The underlying problem was the same. Three squads whose contexts barely overlapped, and I couldn't properly delegate any of them, so I ended up not really knowing any of them well enough.
March
- I sat with the question of how to spend the time ahead, what I was actually hoping to get out of this place, and once I weighed everything I realized the person who should have left was me. The right moment would have been the moment people here stopped expecting anything from me, and even that window had already closed. I'd come this far on a shabby sense of duty, but once I accounted for every opportunity cost, leaving (even now) was the right call. At this point I'm sure of it.

That's as far as I'll go on work. I'll put the full company retrospective in a separate post. A rough six months doesn't mean every memory from it was bad, not even close.

Personal (Health, Reading, Other)

With my head in a dark place, my routines fell apart, and the times I tried to burn off the bad mood with overly hard workouts, injury caught up with me. The injury made it harder to go to the gym, and I'm still in recovery mode. When my mental state is off and I can't exercise, stress relief slides back into food, and body and mind both kept deteriorating for a while. It's only recently that I've been seeing a doctor consistently and keeping up rehab.

The symptom I'm most worried about is reading. For six months I basically stopped. Even in the busiest stretches I'd never put books down like that, but this time I just stopped. Without input, you burn out. I guess that's what was happening to me, quietly.

As I mentioned, my wife and I had a rough stretch, and one of the things we tried was traveling. France and Switzerland, but also Cebu, Osaka, Kyoto. Whenever the schedule allowed, we tried to get as far away as possible and spend some good time together. There were hard days, since we were in the middle of fighting a lot, but in the end I think they were trips worth taking. They helped more than I expected. What I remember most isn't the famous tourist spots. It's the two of us walking down ordinary streets, talking about nothing in particular.

Closing

This isn't a retrospective, and it isn't really a record of that stretch of time either. Too scattered to be a retro. Too scattered to be a record.

My difficulty might be nothing to someone else. I'm not writing this to get public agreement that what I went through was hard. I'm writing it because, in the middle of these tangled, unresolved thoughts, I somehow made the next choice (the kind of choice that probably needed a good bit of courage) and stumbled into something new by backing away. This is a record of how I was backing away.

Life keeps handing you an easy choice and a hard choice and forcing you to pick one. That's the cruel part. And the answer has always been the harder one. I chose the harder path because it was the harder one. Whatever mess the company is in, staying there is the easier call for me. I'd regret it if I backed down, so since I was already stepping backward, I figured I'd step onto a new road instead. I'm not trying to push the harder path on anyone else. Everyone has their own best answer.

I'll leave the concrete plans for what comes next, and the four-year retro on my time at the company, for other posts.

All that is gold does not glitter Not all those who wander are lost The old that is strong does not wither Deep roots are not reached by the frost From the ashes a fire shall be woken A light from the shadows shall spring Renewed shall be blade that was broken The crownless again shall be king

⁃ J.R.R. Tolkien, The Lord of the Rings

Previous retrospective
- Blood, Sweat, and Melancholy (July & August 2024)
Next retrospective
- Four-Year Retrospective at IHFB