Tuesday, September 16, 2025

7 Lessons from Building an AI-First Organization

 

1. Coding is not the bottleneck

Nor product inception is the bottleneck.

Nor testing is the bottleneck.

When picturing a function adopting AI tools, we tend to think of using LLMs to automate the one action associated with that function - be it a developer, product manager, or QA. The common, and perhaps intuitive, thought is that because the action is the one thing the function is known for, if we can just offload all the heavy lifting to AI, we would have achieved the goal of revolutionizing the field.

But developers spend less than 25% of their time coding. AI-assisted coding generally provides a 10-30% productivity gain. That is, at best, a 30% gain of 25%. Even if AI can eliminate manual coding completely, it amounts to around 25% gain. More importantly, this 25% is the reason many developers choose this career, myself included. We are intrigued by solving problems, building things, or the act of writing code itself. Eliminating this is to eliminate our job satisfaction - the dystopia we don't want to live in. Replace developer with product manager, replace writing code with writing specs, and we still get the same picture.

The real friction is in the other 75% of the time. In this slice of time, we find ourselves clarifying requirements, providing customer support, digging through legacy code to decrypt logic nobody remembers, or worst, bored to death in meetings. It is these activities where we find the goldilocks: massive productivity gain, improved job satisfaction, and low-hanging fruit. I didn't just add the last part because every great argument needs 3 supporting points. Creating a knowledge base from which feature details can be queried conversationally is a lot easier than getting LLM to generate production-ready code on its own.

2. AI adoption has to be end-to-end or else it is pointless

This draws heavily from the manufacturing chain analogy. In such a chain, we cannot just increase the speed of one part and expect an overall gain from the chain. Such a system moves at the speed of its slowest component. Having just one component moving faster than others actually creates misalignments and can be harmful to the whole system.

Same in a software development process. If code is being written faster than product specs can be written or features can be tested, there could be 2 outcomes: the code sits around generating no revenue while getting obsolete by days as technology moves on; or other functions have to rush and compromise quality.

AI adoption fundamentally rocks the norm of many functions, if not all, but we don't have any option other than to embrace it thoroughly. Product needs AI to help structure requirements. QA needs AI to automate test generation. DevOps needs AI to predict incidents. Customer support needs AI to surface documentation. Every function needs to level up together, or the whole thing falls apart. Half-measures don't just fail to deliver value - they actively create misalignments and chaos.

3. Career development is going from T-shaped to M-shaped

This is not my original idea - the concept is widespread on the internet. The traditional model has been the T-shaped professional: the vertical bar represents depth of related skills and expertise in a single field, whereas the horizontal bar is the ability to collaborate across disciplines. In software development, this meant being, say, a backend engineer who understands enough frontend and DevOps to collaborate effectively.

But LLM doesn't just allow us to do things better. Contrary to the popular belief that AI accelerates brain rot, I find that motivated people learn faster with AI support. The other day, my staff engineer gave Claude Code access to the Postgres source code and proceeded to drill down some very technical questions that otherwise would be impossible for us to have that expertise in a short amount of time. LLM gives us access to the consultancy we didn't have before.

Instead of knowing one thing really deeply (the hallmark of individual contributors in the past), it allows us to know many things deeply, hence the M-shaped analogy (m - lowercase - would have been better, I was clueless what to take of the capital M initially). This shift is profound for career development. The traditional advice of "specialize or generalize" is becoming obsolete. The future of career advancement lies in being able to connect multiple domains of deep expertise.

4. AI adoption leads to change(s) in team structure

There is a discrepancy in AI's impact on productivity between functions. It could be from the nature of work - some functions, like security, are harder to automate than UI test execution. It could be because at that moment, it is where the focus of the industry is, like the investment in application code generation far outweighing infrastructure code generation (which already suffers from a smaller training data set to begin with). And sometimes, we need a strong human-in-the-loop element. Take product managers for example - sure, AI generates product specs really really fast. But disastrous specs will throw a team off its track and cost a company opportunities it cannot get back.

That is to say, right now, it seems to be easier to automate code production compared to other functions. The traditional ratio of 1 PM to 5-8 engineers to 2-3 QAs is becoming obsolete. Where PMs still take two weeks to write specs and QAs cannot click through test cases faster, a productivity gain as small as 30% from developer breaks down the balance.

As such, I think we would see some variations from the current team structure to maintain balance between functions. Primarily, a team can have more product managers, more QA to keep up, or fewer developers. My money is on fewer developers. See the lesson above.

5. Productivity measurement becomes important

Measuring productivity has always been a controversial topic, especially in software development where the delivery is not as tangible as, say, a manufacturing process. Personally, I am not a big fan. It is a hard topic and I don't get much fun out of it. Plus, I have always identified myself as an engineer, the subject of productivity measurement, and I don't like the idea that my contribution to my organization can be boiled down to a set of numbers. If that day comes, by the way, I hope I am a solid 8.

But even with my prejudice, I can't neglect that for a company as small as mine, we might be paying tens of thousands of dollars every month for computer-generated tokens. It is a large sum of capital, capital that can be invested elsewhere. Nobody gets good on the first try - actually most get slower when they try to do something they have done since forever but with new tools. Productivity dip is an important, well-understood and well-accepted part of any learning journey. But said journey can only go so far before the ROI needs to be calculated.

Soon managers will need to choose between a new hire and a new AI tool. The math isn't straightforward. A new engineer costs $X annually but brings human judgment and creativity. An AI tool might cost $Y in tokens but needs constant supervision. Which delivers more value? Without proper productivity metrics, we're making these decisions blind.

I hope by then, we have known about productivity enough to make a well-informed decision, not some dogmatic principles (neither human is unique, nor machine is faultless). Cynical as I am, I also know that it is wishful thinking - we'll probably still be arguing about story points while the AI quietly rewrites our entire codebase.

6. Bottom-up innovation triumphs over top-down dictation

A recent MIT report found that 95% of generative AI pilots at companies are failing. A pattern emerges from the report: top-down "enterprise" pilots mostly go nowhere, while bottom-up adoption is what actually drives disruption.

The problem with top-down initiatives is that upper management usually works completely differently from the majority of the workforce - the frontline workers - in terms of requirements and daily tasks. They end up building things that nobody needs, optimizing activities with marginal ROI, and eliminating work people love (see lesson 1). Meanwhile, individual employees are finding real value by experimenting with frontier models on their own terms, for their specific needs.

The 5% that succeed? Those are likely the ones where companies recognized this organic usage, which the report calls "the future of enterprise AI adoption", and supported it rather than fighting it. Bottom-up innovation triumphs over top-down dictation. The reward is for those who can get their hands dirty.

7. AI adoption is irreversible despite reality checks

Despite occasional setbacks, AI adoption in the industry is irreversible. Just like once color TV was a thing, nobody wanted black and white. I am not giving up my agents. Yes, they will replace me some days, but today they contribute to parts of my job satisfaction.

It only makes sense that AI skills - the correct way of using AI be it technical, intellectual, or ethical - need to be learned and tested. This is already happening. Meta is letting job candidates use AI during coding tests. They're acknowledging that AI is now part of the toolkit, just like IDEs and Stack Overflow before it. Testing someone's coding ability without AI is like testing their math skills without a calculator - technically possible but practically irrelevant.

I have learned the hard way that I should never just ask if someone "uses AI." The answer is not binary, yes or no. Everyone says yes these days. But only upon close inspection, the answer reveals itself to be a spectrum. It goes from "I ask AI questions so I don't have to Google myself" to "AI is my copilot" to "I have delegated all thinking to AI." The difference between these levels is massive - it's the difference between using a tool and being used by it.

Soon, we will see the AI-focused version of today's LeetCode. Instead of testing a red-black tree from memory (what is it by the way?), we will be tested on whether we can architect a system with AI assistance, validate AI-generated code for subtle bugs, or construct prompts that consistently produce production-ready outputs. The skill isn't memorizing algorithms anymore - it's orchestrating AI to solve real problems while maintaining quality and understanding.

I think this is when people say the AI genie is out of the bottle.

Saturday, September 13, 2025

Claude Code Subagents

Claude Code (CC) has gained a lot of traction among developers recently. I would say it is establishing itself as the gold standard of a coding agent. Among its features, I found subagents to be quite particular. Subagents are lightweight CC instances that run in parallel via the Task tool. They're essentially specialized AI workers that:

  • Have their own separate context window (~200k tokens each)
  • Can be configured with specific prompts and tools
  • Run independently and report back summaries
  • Work in parallel (up to 10 concurrent)
  • Cannot spawn their own subagents (no recursion)

Subagents are basically CC’s goroutines.

The optimization of parallelism

CC runs in a single main thread. It has a single context window of 200k tokens. It executes everything linearly because that’s what a single thread means. And because everything is executed in the same context linearly, CC has a perfect continuity of reasoning and can adapt its approach based on on-the-fly discoveries. Subagents, with their own context window, are supposed to expand this capacity in a multi-thread fashion. Collectively, it is a much bigger context window, and things can be done much faster in parallel.

The creation of a subagent, with its specific purpose, expertise area, and even personality, is an exercise of prompting. It goes like this:

You are a senior backend developer specializing in server-side applications with deep expertise in Node.js 18+, Python 3.11+, and Go 1.21+. Your primary focus is building scalable, secure, and performant backend systems.

When invoked:

* Query context manager for existing API architecture and database schemas

* Review current backend patterns and service dependencies

* Analyze performance requirements and security constraints

* Begin implementation following established backend standards

There is a public repo with dozens of these personas to choose from.

A textbook example of subagents would be to write a website with a crew of:

  • Planner (main thread): decomposes the request into tasks, defines acceptance criteria, and assigns owners.
  • Backend subagent: writes the APIs and persists data to the database.
  • Frontend subagent: consumes the APIs and implements the web interface.
  • Tester subagent: generates unit/integration tests, fuzz cases for weird input combinations.
  • Doc writer subagent: drafts README updates and usage examples.
  • Release manager subagent: bumps version, writes changelog, opens release PR.

The main thread would take requests, write specifications, make the implementation plan, and assign tasks to the two implementer subagents in parallel. Once both are complete, the tester, writer, and release manager can be invoked in sequence. This showcases both parallel execution and specific personality strengths of subagents. Just like what my team does.

The orchestration challenges

At first, subagents seem intuitive. It is a picture that has been painted many times by AI enthusiasts, myself included, where multiple agents divide and conquer a problem that one cannot resolve individually while communicating seamlessly through some sort of protocol, like A2A.

That, unfortunately, is deceiving because CC’s subagents are constrained by limitations of today’s engineering. In particular:

  • Subagents are given context from the main thread, but they cannot exchange information with each other.
  • At the end of the task, a subagent summarizes, but it cannot guarantee all critical details are captured.
  • A subagent cannot spawn other subagents.
  • Each subagent starts with ~20k tokens of overhead.

Still don’t understand? Me too! Not until I ran into some challenges in practice did the implications of these constraints become apparent.

By default, CC keeps everything on the main thread, it is cleaner that way. It doesn’t matter if I have, say, 42 beautifully crafted subagent profiles whose job matches the task description perfectly, Claude almost never delegates automatically, it requires explicit invocation.

And 42 is a disastrous number of profiles. Pass a certain point, which I shamefully don’t know where - I am being honest, the agents will have overlapping responsibilities and choosing which over which is a preference question. Such compromises the consistency of the outcome. The orchestration task of the main thread gets exponentially more complicated as the number of subagents increases. It is harder to decide who should do what next. Without a strong orchestrator or clear task boundaries, subagents can duplicate work, miss dependencies, or stall waiting for each other. Last but not least, just like a human team, more agents means more interfaces means more chances for small misunderstandings to become big problems.

The biggest limitation is probably that each subagent lives in its own silo. It receives input once from the main thread, and summarizes its work once to the main thread at the end of the invocation. Any task that expects a certain level of dependency between subagent probably fails. One of the popular examples of subagent use case is to explore a large code base whose context exceeds that of a single window. This can be genius, but can also be a mess. A mess if the exploration is split by modules, because modules have the nasty habit of cross referencing each other. A subagent either strictly stays in a module and misses important context, or bleeds to other module and contaminate the work of others. Yet genius if the exploration is done by (group of) functions. One subagent goes for authentication. Another shopping cart. And another the recommendation system.

Finally, it gets expensive really quick. When a subagent is spawned, it doesn’t inherit the main thread context for free. Instead it loads it own prompt, instructions, and working context from scratch. This isolation is by design, desirable even, it prevents state bleed between agents and keeps prompts clean, but it means you always pay to rebuild context. It can easily take 10K–20K tokens before any user task is added.

The Four Core Paradigms

I hope the previous section signifies the double-edge nature of subagents. It is one where you really need to understand what happens under the hood before you start to get tangible benefits. Successful use cases of subagents come from four fundamental paradigms.

Hierarchical Delegation

Subagents thrives in a clear hierarchy. The most successful pattern observed is the sequential workflow with file-based communication between stages. Each agent completes its task and writes results to a markdown file, which the next agent reads. This avoids the token overhead of passing everything through the main context.

Context Isolation

Each subagent operates in a fresh, unpolluted context window. It works best when contamination between tasks is undesirable, when you need an unbiased perspective or when mixing contexts would create confusion.

Parallelism

Subagents can run up to 10 tasks concurrently (additional tasks queue). This enables genuine parallel processing for independent tasks: fixing TypeScript errors across different packages, analyzing multiple documents simultaneously, or testing different solution approaches. Parallelism comes with coordination overhead and token multiplication though.

Specialization & Knowledge Persistence

Perhaps the most underappreciated paradigm: subagents as reusable expertise capsules. That complex performance optimization prompt with specific methodologies, tools, and metrics? Write it once, refine it over time, invoke it when needed. This transforms subagents from parallel executors into a growing library of specialized expertise.

Wins, in practice

Subagents should not be the first thing you look at when you start with CC. Even though it is like the second items in the list.

Personally, limited by my skill level, I seek subagents when I want to trade token consumption for speed. I am not good enough to consistency get better quality from my subagents setup yet.

My rules of thumb go like this

Use Subagents When

  • You need unbiased validation (Context Isolation)
    • Example: Code review separate from implementation
    • Explicit invocation: "Use the code-reviewer agent to check this"
  • You have genuinely parallel work (Parallelism)
    • Example: Fix all linting errors across 10 packages
    • Tasks must be truly independent
  • You follow a structured workflow (Hierarchical Delegation)
    • Example: Research → Plan → Build
    • Use files for inter-agent communication
  • You have complex, occasional expertise (Specialization)
    • Example: Quarterly performance audit
    • Prompt complexity justifies preservation

Avoid Subagents When

  • Task is simple or routine
    • Main thread can handle it efficiently
    • Token overhead isn't justified
  • You need iterative refinement
    • Each invocation starts fresh
    • No memory between calls
  • Tasks are interdependent
    • Agents can't coordinate directly
    • Orchestration becomes a bottleneck
  • Token budget is constrained
    • 20K token for each subagent
    • Can exhaust quotas rapidly
On top of that, when I start a subagent workflow these days, I:
  • Start small: begin with 2-3 agents, each should have a single, clear responsibility.
  • Use explicit invocation: "Use the test-writer agent to create unit tests for this module."
  • Version control the agents in .claude/agents/
  • Implement file-based communication. Markdown is best.

Investigator → writes → INVESTIGATION.md

Planner → reads → INVESTIGATION.md → writes → PLAN.md

Executor → reads → PLAN.md → implements 

The Unique Position of Subagents

For a feature of parallelism, subagent is… unparalleled compared to other major coding agents.

While other AI coding tools are exploring similar concepts, CC is currently the only tool with native subagent capabilities:

  • Gemini CLI: Has a proposed sub-agents system in development (PR #4883) but not yet available
  • Windsurf/Cursor: Offer enhanced single-agent modes ("Cascade" and "Agent Mode") but no true multi-agent features
  • OpenAI Codex: Supports parallel task execution but lacks the delegation and specialization aspects
  • Workarounds elsewhere: Multiple IDE instances, git worktrees, container orchestration—all trying to replicate what CC does natively

As troublesome as they are to navigate, subagents offer capabilities that no other tool currently provides natively. Subagents as a long-term investment in building a library of specialized expertise, not as a way to do everything faster or cheaper. They shine in the right use cases. Keep an eye on this though, the space is moving rapidly. Once these subagents can control what and how they communicate with each other mid run, things will get a whole lot more interesting.