May was an exciting month for Tech. There was Google I/O, where Google Glass tried to make a comeback. There was Microsoft Build, which, to be honest, I watched for the first time in a while. I am keeping an eye on NLWeb. I almost missed LangChain Interrupt. But my favorite one is Code With
- Claude, Anthropic’s first developer conference.
- Claude Opus and Sonnet 4 were released.
- Claude Code was available on VS Code (and its forks) and JetBrains IDEs.
Essentially all IDEs out there now have access to the same state-of-the-art model and coding agent. So what does this mean for us developers?
Let’s examine the code generation landscape.
Levels of use case complexity
The field of automatic code generation is exploding. Take Cursor for example, its features include Tab, Ctrl+K, Chat, and Agent. They all generate code in some shapes and forms. They also serve vastly different use cases to the extent that it is really awkward to use one feature for another’s use case. The abundance of variation means “developers who use AI are more productive” makes as much sense as announcing “mathematicians with calculators are better than those without.”
Aki Ranin made a framework to categorize the sophistication of AI agents we’ll be interacting with.
An agent starts with a low level of autonomy. It plays a reactive role, responding to human requests. It gradually becomes more active, responds to system events, and might or might not require a human supervisor. The number of actions it can take and the creativity level of its solution are still rather limited. Finally, the agent takes on a human-like role and within its boundary can handle a task end-to-end.
Mapping that to the features of Cursor and other AI-assisted IDEs, I categorize AI support for developers into 4 levels.
Level 1: Foundational Code Assistance: Characterized by real-time, localized suggestions with minimal immediate context. The interaction is primarily reactive: the developer types, the AI suggests, and the developer accepts, rejects, or ignores the suggestion. Autonomy is low, relying heavily on pattern matching.
Level 2: Contextual Code Composition and Understanding: AI tools at this level utilize broader file or local project context and engage in more interactive exchanges. They can generate larger code blocks, such as functions or classes, and perform basic code understanding tasks. Developers typically provide prompts, comments, or select code for AI action.
Level 3: Advanced Co-Development and Workflow Automation: These AI systems exhibit deep codebase awareness, potentially including multi-file understanding. They can automate more complex tasks within the Software Development Life Cycle, assisting in intricate decision-making. The developer delegates specific, bounded tasks to the AI.
Level 4: Sophisticated AI Coding Agents and Autonomous Systems: This level represents high AI autonomy, including the ability to plan and execute multi-step tasks towards end-to-end completion. These systems can interact with external tools and environments, requiring minimal oversight for defined goals. The developer defines high-level objectives or complex tasks, which the AI agent then plans and executes, with the developer primarily reviewing and intervening as necessary.
Impact on Developer Productivity
Measuring developer productivity is a notorious topic. Take all the metrics below with a heavy grain of salt.
The bright side
Level 1: Foundational Code Assistance
Most developers actually like this level: it handles the boring stuff without getting in the way. Many find these tools "extremely useful" for such scenarios, appreciating the reduction in manual typing.
GitHub, in its own paper "Measuring the impact of GitHub Copilot", boasts a 55% faster task completion rate when using "predictive text". GitHub obviously has the incentive to be a bit liberal on how this was measured. It's like asking a barber if you need a haircut. Independent studies suggest the real number is more 'your mileage may vary' than 'rocket ship to productivity paradise.' Researches from Zoominfo and Eleks say the number in practice is closer to 10-15%.
Level 2: Contextual Code Composition and Understanding
At this level, not only does the AI assistance generate code (bigger than Level 1), it also provides the utility for learning and code comprehension.
As AI generates larger and more complex code blocks, the perception is also more mixed. Typical complaints are inconsistencies in the output quality and almost-correct code ultimately taking more time to modify than writing new. It's the uncanny valley of code generation, close enough to look right, wrong enough to ruin your afternoon. Code comprehension, fortunately, enjoys a more universal positive feedback in being valuable for grasping unfamiliar code segments or new programming concepts.
On one hand, IBM's internal testing of Watsonx Code Assistant projected substantial time savings: 90% on code explanation tasks and a 38% time reduction in code generation and testing activities. On the other hand, a study focusing on Copilot found that only 28.7% of its suggestions for resolving coding issues were entirely correct, with 51.2% being somewhat correct and 20.1% being erroneous. This is what we continue to observe as the level of autonomy increases. Welcome to the future. It's complicated.
Level 3: Advanced Co-Development and Workflow Automation
This level of AI assistance is characterized by a multi-step thinking process and multi-file context. At this point, AI starts feeling less like a tool and more like that overachieving colleague who reorganizes the entire codebase while you're at lunch. Helpful? Yes. Slightly terrifying? Also yes.
Though we continue to see the correlation between high autonomy and high failure rate as we do in Level 2, Level 3 is where we start to see a new class of AI-first projects. These are projects planned specifically to incorporate AI capacity into their development life cycle. For example: API convention enforcement in CICD, automated test generation, and feature customization within distinct bounded contexts. There is a clear appreciation for the automation of time-consuming tasks.
Given the wide range of AI-assisted workflows, concrete productivity gains are harder to find in research. However, the adoption of Level 3 AI capabilities is undoubtedly growing. The 2024 Stack Overflow Developer Survey revealed that developers anticipate AI tools will become increasingly integrated into processes like documenting code (81% of respondents) and testing code (80%). GitHub's announcement of Copilot Code Review highlighted that over 1 million developers had already used the feature during its public preview phase. Level 3 resonates well with the "shift left" paradigm in software development.
Level 4: Sophisticated AI Coding Agents and Autonomous Systems
Level 4 is the holy grail of autonomous AI agents. Claude Code is a CLI tool, you chat with it, but you don’t code with it. Devin goes a step further, you chat with Devin through Slack.
There is an overall excitement about the future of software development with the arrival of these autonomous agents. However, Level 4 agents are still in their early days. Independent testing by researchers at Answer.AI painted a more sobering picture: Devin reportedly completed only 3 out of 20 assigned real-world tasks.
Some other Level 4 agents demonstrate impressive capabilities on benchmarks like SWE-Bench. Claude Opus 4 got 72.5%, which is probably higher than mine. Yet their application to real-world complex software development tasks reveals a significant "last mile" problem. Outside of the controlled environments, these agents often struggle with ambiguity, unforeseen edge cases, and tasks that require deep, nuanced human-like reasoning or interaction with poorly documented, unstable, or unpredictable external systems.
The following table provides a comparative overview of the four AI usage levels, summarizing key characteristics, developer perceptions, productivity impacts, common challenges, and adoption insights.
The other side
Poor code quality
Over-reliance and potential deskilling
Context limitations
The Pragmatic Path Forward
Great things may come to those who wait, but only the things left by those who hustle.
Build your AI crew
AI-first projects
Brownfield projects
- Before code modification, give the system’s documentation an overhaul. AI can generate initial drafts of documentation and summaries. The effort can be complemented with interviewing long-tenured developers to capture "tribal knowledge" - AI transcripting is particularly helpful. Finish up with a manual validation and refine. A happy side effect is that this logic documentation is also fundamental in building an AI data analysis experience.
- If the code base is a behemoth, the context window will be a problem, especially with my favorite Claude. Divide the “memory” into tiers. The most basic level is the immediate prompt and the ongoing conversation, which forms the memory of the task at hand. Second is the project memory - claude.md, .cursor/rules, and whatnot - this hosts key decisions, patterns, and learned codebase knowledge. This will be added to the context window of every conversation, so be really selective about what is put in there. The last layer is an overview of the project, but sometimes the entire code base because of intersystem dependencies. This is the cutting edge of automatic code generation at the moment, companies are looking into RAG and GraphRAG solutions to bridge this gap.
- Once the ground knowledge is there, avoid a "big bang" rewrite. Instead, use AI to help understand and build interfaces and unit tests around specific modules of the legacy system. Gradually replace or refactor these modules, with AI assisting in understanding the old logic and integrating new components.