Monday, July 7, 2025

May was an exciting month for Tech. There was Google I/O, where Google Glass tried to make a comeback. There was Microsoft Build, which, to be honest, I watched for the first time in a while. I am keeping an eye on NLWeb. I almost missed LangChain Interrupt. But my favorite one is Code With 

  • Claude, Anthropic’s first developer conference.
  • Claude Opus and Sonnet 4 were released.
  • Claude Code was available on VS Code (and its forks) and JetBrains IDEs.

Essentially all IDEs out there now have access to the same state-of-the-art model and coding agent. So what does this mean for us developers?

Let’s examine the code generation landscape.

Levels of use case complexity

The field of automatic code generation is exploding. Take Cursor for example, its features include Tab, Ctrl+K, Chat, and Agent. They all generate code in some shapes and forms. They also serve vastly different use cases to the extent that it is really awkward to use one feature for another’s use case. The abundance of variation means “developers who use AI are more productive” makes as much sense as announcing “mathematicians with calculators are better than those without.”

Aki Ranin made a framework to categorize the sophistication of AI agents we’ll be interacting with.

An agent starts with a low level of autonomy. It plays a reactive role, responding to human requests. It gradually becomes more active, responds to system events, and might or might not require a human supervisor. The number of actions it can take and the creativity level of its solution are still rather limited. Finally, the agent takes on a human-like role and within its boundary can handle a task end-to-end.

Mapping that to the features of Cursor and other AI-assisted IDEs, I categorize AI support for developers into 4 levels.


Level 1: Foundational Code Assistance: Characterized by real-time, localized suggestions with minimal immediate context. The interaction is primarily reactive: the developer types, the AI suggests, and the developer accepts, rejects, or ignores the suggestion. Autonomy is low, relying heavily on pattern matching.

Level 2: Contextual Code Composition and Understanding: AI tools at this level utilize broader file or local project context and engage in more interactive exchanges. They can generate larger code blocks, such as functions or classes, and perform basic code understanding tasks. Developers typically provide prompts, comments, or select code for AI action.

Level 3: Advanced Co-Development and Workflow Automation: These AI systems exhibit deep codebase awareness, potentially including multi-file understanding. They can automate more complex tasks within the Software Development Life Cycle, assisting in intricate decision-making. The developer delegates specific, bounded tasks to the AI.

Level 4: Sophisticated AI Coding Agents and Autonomous Systems: This level represents high AI autonomy, including the ability to plan and execute multi-step tasks towards end-to-end completion. These systems can interact with external tools and environments, requiring minimal oversight for defined goals. The developer defines high-level objectives or complex tasks, which the AI agent then plans and executes, with the developer primarily reviewing and intervening as necessary.

Impact on Developer Productivity

Measuring developer productivity is a notorious topic. Take all the metrics below with a heavy grain of salt.

The bright side

Level 1: Foundational Code Assistance

Most developers actually like this level: it handles the boring stuff without getting in the way. Many find these tools "extremely useful" for such scenarios, appreciating the reduction in manual typing.

GitHub, in its own paper "Measuring the impact of GitHub Copilot", boasts a 55% faster task completion rate when using "predictive text". GitHub obviously has the incentive to be a bit liberal on how this was measured. It's like asking a barber if you need a haircut. Independent studies suggest the real number is more 'your mileage may vary' than 'rocket ship to productivity paradise.' Researches from Zoominfo and Eleks say the number in practice is closer to 10-15%.

Level 2: Contextual Code Composition and Understanding

At this level, not only does the AI assistance generate code (bigger than Level 1), it also provides the utility for learning and code comprehension.

As AI generates larger and more complex code blocks, the perception is also more mixed. Typical complaints are inconsistencies in the output quality and almost-correct code ultimately taking more time to modify than writing new. It's the uncanny valley of code generation, close enough to look right, wrong enough to ruin your afternoon. Code comprehension, fortunately, enjoys a more universal positive feedback in being valuable for grasping unfamiliar code segments or new programming concepts.

On one hand, IBM's internal testing of Watsonx Code Assistant projected substantial time savings: 90% on code explanation tasks and a 38% time reduction in code generation and testing activities. On the other hand, a study focusing on Copilot found that only 28.7% of its suggestions for resolving coding issues were entirely correct, with 51.2% being somewhat correct and 20.1% being erroneous. This is what we continue to observe as the level of autonomy increases. Welcome to the future. It's complicated.

Level 3: Advanced Co-Development and Workflow Automation

This level of AI assistance is characterized by a multi-step thinking process and multi-file context. At this point, AI starts feeling less like a tool and more like that overachieving colleague who reorganizes the entire codebase while you're at lunch. Helpful? Yes. Slightly terrifying? Also yes.

Though we continue to see the correlation between high autonomy and high failure rate as we do in Level 2, Level 3 is where we start to see a new class of AI-first projects. These are projects planned specifically to incorporate AI capacity into their development life cycle. For example: API convention enforcement in CICD, automated test generation, and feature customization within distinct bounded contexts. There is a clear appreciation for the automation of time-consuming tasks.

Given the wide range of AI-assisted workflows, concrete productivity gains are harder to find in research. However, the adoption of Level 3 AI capabilities is undoubtedly growing. The 2024 Stack Overflow Developer Survey revealed that developers anticipate AI tools will become increasingly integrated into processes like documenting code (81% of respondents) and testing code (80%). GitHub's announcement of Copilot Code Review highlighted that over 1 million developers had already used the feature during its public preview phase. Level 3 resonates well with the "shift left" paradigm in software development.

Level 4: Sophisticated AI Coding Agents and Autonomous Systems

Level 4 is the holy grail of autonomous AI agents. Claude Code is a CLI tool, you chat with it, but you don’t code with it. Devin goes a step further, you chat with Devin through Slack.

There is an overall excitement about the future of software development with the arrival of these autonomous agents. However, Level 4 agents are still in their early days. Independent testing by researchers at Answer.AI painted a more sobering picture: Devin reportedly completed only 3 out of 20 assigned real-world tasks.

Some other Level 4 agents demonstrate impressive capabilities on benchmarks like SWE-Bench. Claude Opus 4 got 72.5%, which is probably higher than mine. Yet their application to real-world complex software development tasks reveals a significant "last mile" problem. Outside of the controlled environments, these agents often struggle with ambiguity, unforeseen edge cases, and tasks that require deep, nuanced human-like reasoning or interaction with poorly documented, unstable, or unpredictable external systems.

The following table provides a comparative overview of the four AI usage levels, summarizing key characteristics, developer perceptions, productivity impacts, common challenges, and adoption insights.

The other side

I wouldn’t necessarily refer to this section as “the dark side”. I don’t think the AI future is apocalyptic. However, there is plenty of evidence that the integration of AI into the fabric of software engineering is not rosy. You probably have heard about Klarna.

Running a team of developers, I see there is another challenge in utilizing AI for productivity gain: pushback from developers.

Poor code quality

GitClear tore apart GitHub’s own “55% faster” study and traced large spikes in bugs, rewrites, and copy-pasted blocks associated with automatic code generation. The study projected that "code churn", the percentage of code discarded less than two weeks after being written, would double, suggesting that AI-generated code often requires substantial revisions.

For my team, we can only use around 40-50% of the generated code. Though of course there is a matter of better prompt and context, we see that some code blocks are sub-optimal or almost correct. Sub-optimal code is still functional, but unlikely to make it pass code review. Almost-correct code is far worse. Rewriting a piece of almost-correct code takes longer than writing it from scratch. If it sneaks past code review, it becomes a production bug.

While AI is writing code at a superhuman speed, it is mounting tech debts just as fast.

Over-reliance and potential deskilling

Beyond code quality, reliance on AI outputs can diminish cognitive engagement, erode essential problem-solving skills, and weaken the deep understanding of core coding principles. As one developer put it, "writing code yourself remains important because writing code is not just writing code: it's an organic process, which familiarizes you with the language, which gives you the philosophy of the language". Failure to develop this intimate knowledge of the languages and system designs will hinder career development.

Furthermore, many developers reported that they shifted their focus from creative problem-solving to merely verifying AI outputs. The thing is, many got into programming not because of the software business but because they found the act of programming a fulfilling experience. They don’t just focus on shipping the code, they also enjoy the journey of getting there. Downgrading that experience to babysitting an AI agent deprives them of the job satisfaction.

“Vibe coding” was coined in February 2025. I don’t think there has been enough time for people to lose their programming skills, but can confirm that the underlying fear of becoming "passive participants" and eventually redundant in the coding process prevents some from fully embracing the advantages of AI.

Context limitations

This last challenge is not technosocial, it is pure technical. Today's AI coding assistants, while powerful coding assistants, face significant context limitations that deter full adoption in software development. Their finite context windows mean they're constantly forgetting what you told them five minutes ago. It's like trying to build a house with a contractor who has amnesia, every morning you have to re-explain why the bathroom shouldn't be in the kitchen. These systems struggle with large codebases or numerous interconnected modules, and they get confused as the number of tools and MCPs increases. Progress.

Look, managing all this context is a pain in the ass. You're constantly iteratively refining prompts and working around these inherent technological drawbacks. Consequently, some prefer to await more mature AI solutions that can seamlessly handle large-scale context and maintain conversational memory, rather than investing significant effort in mastering the current generation's limitations.

The Pragmatic Path Forward

Great things may come to those who wait, but only the things left by those who hustle.

LLM is the closest we have ever gotten to AGI. It is coming not like a wave but a tsunami. It will sweep away people trying to go against the current of the age. And for that, we must not wait. Skills and experiences come with being in the field, getting to know all the moving parts, and being ready for the latest additions.

There are the usual sayings of “treating AI like a junior pair programmer” and “learning the art of writing clear, concise, and context-rich prompts”, which I think are definitely important. But they further emphasize the notion that the future of software development is a barren land, hyper-focused on productivity and deprived of creative joy. But what is work without the enjoyment?

Build your AI crew

Programming is changing, and with it, developers like me need to adapt. I got into programming because I like to build things, not get things built (there is a slight difference there on which my existence depends). I like to get into the zone for complex problem-solving, (criticizing) system architecture, and figuring out what new technologies mean for the business. But I also have to admit that after 15 years, I dread building the next login page, HTML email template, or coding standard. The goldilock for me has always been to figure out a way to keep the work I like for myself and push everything else away, to other humans or machines alike.

Delegating to junior developers means defining well-scoped tickets, giving them the hands-on experience (plus the chance to fail), and growing their technical skills. It takes multiple AI agents to be remotely comparable to a single junior developer. Their context is also limited, not enough to handle an end-to-end task. Delegating to AI agents, therefore, means tinkering with a multi-agent setup where each concrete task, such as analysing requirements, defining relevant unit tests, or implementing a piece of business logic, is distributed to different agents communicating via some intermediate medium like a document, a spreadsheet, or the code base itself. In a demo at Code With Claude 2025, a sample promoted workflow is to have AI implement a mock UI (mock.png), then give it Puppeteer to take screenshots, and ask it to reiterate till pixel-perfect. Once I get into the gist of it, it is actually less delegating and more system design, building a platform to automate the boring work away.

AI-first projects

I find that sometimes we give AI the most gnarly bits of the system to work on, bits that we don’t even want to look at ourselves, get disappointed at the mess, and declare AI is not ready yet. The last part is correct, they are junior, the eldest is around 3 in human years. Think about what most 3-year-olds do: they put things in their mouth, draw on walls, and occasionally produce something brilliant by pure accident. Sound familiar?

Greenfield projects where the code base is still small, and AI-friendly design principles (such as modularity, clear APIs, good documentation) are easy to enforce play on the strength of AI and negate some of its most significant drawbacks.

Some projects take it a step further. From their inception, certain parts of the software were destined to be written by AI.

The human code orchestrates the service's overall behavior and is kept segregated from the AI code. Think of the strategy or decorator design pattern on steroids.

The AI code is heavily modularized, has a lower standard of code quality, and can be rewritten (or rather, regenerated) at a moment's notice.

Requirements and tests are emphasized because they are the primary deliverable, not the expendable code. Their creation, in turn, can also be AI-assisted as part of the AI crew concept.

We do a fair share of custom HTML email templates for our customers. Crafting HTML and CSS is not the most thought-provoking activity, and the constant back-and-forth checks to support pixel-perfect design across browsers and email clients are nerve-wrecking. Needless to say, turning that module into an AI-first project was enthusiastically supported.

Brownfield projects

Yet not every project can be an AI-first project. Despite constantly adding new services and breaking down old ones, I reckon many of the critical code blocks reside in a handful of projects that have been around since the beginning of time and are lovingly referred to as legacy code.

These projects are challenging even to the best human developers. A single feature might span dozens of files, each with its own decade-old conventions. Every developer who touched it left their mark. Now it's a Frankenstein’s monster of coding styles. And AI output is a function of its input.

We can adopt an incremental approach ("Strangler Fig" Pattern):
  • Before code modification, give the system’s documentation an overhaul. AI can generate initial drafts of documentation and summaries. The effort can be complemented with interviewing long-tenured developers to capture "tribal knowledge" - AI transcripting is particularly helpful. Finish up with a manual validation and refine. A happy side effect is that this logic documentation is also fundamental in building an AI data analysis experience.
  • If the code base is a behemoth, the context window will be a problem, especially with my favorite Claude. Divide the “memory” into tiers. The most basic level is the immediate prompt and the ongoing conversation, which forms the memory of the task at hand. Second is the project memory - claude.md, .cursor/rules, and whatnot - this hosts key decisions, patterns, and learned codebase knowledge. This will be added to the context window of every conversation, so be really selective about what is put in there. The last layer is an overview of the project, but sometimes the entire code base because of intersystem dependencies. This is the cutting edge of automatic code generation at the moment, companies are looking into RAG and GraphRAG solutions to bridge this gap.
  • Once the ground knowledge is there, avoid a "big bang" rewrite. Instead, use AI to help understand and build interfaces and unit tests around specific modules of the legacy system. Gradually replace or refactor these modules, with AI assisting in understanding the old logic and integrating new components.

Conclusion

From 10-15% productivity gains in Level 1 to the promises and pitfalls of Level 4 autonomous agents, one thing is clear: AI is reshaping software development at every level. The path forward isn't about choosing between human creativity and AI efficiency. It's about finding the right blend for your context. There is a whole spectrum to choose from, starting with delegating boilerplate to AI while keeping complex problem-solving for yourself, to architecting AI-first systems, to teaching AI to understand your decade-old codebase. In the end, the developers who thrive will be those who build better systems, be it a code production engine or features. It has always been that way.