Friday, February 20, 2026

Mẻ code

Cuối tháng 4, trời nắng bể đầu, Sài Gòn thì vẫn bụi, mưa đầu mùa đâu chưa thấy. Từ trạm metro vào tới thang máy văn phòng, Phát mới thoát khỏi không khí quánh mùi khói xe và... mùi mồ hôi nách. Hít một hơi làm mới hai lá phổi, Phát xem qua Slack. Không tệ, Phát chưa coi hết các dòng status, nhưng cơ bản xanh nhiều hơn đỏ, còn cụ thể ra sao, chờ ly cà phê sáng hạ hồi phân giải.

Phát, 28 tuổi
Software Engineer

Phát không coi qua những báo cáo màu xanh. Xanh tươi, xanh mơn mởn, XanhSM. Xanh nghĩa là code đã xong và hệ thống kiểm tra cho qua. Đám code đó sẽ đi lên production vào đầu giờ chiều. Màu đỏ mới là công việc của Phát sáng nay. Đỏ lửa, đỏ đen, đèn đỏ. Đỏ nghĩa là đã có sai lệch giữa tư duy AI và kết quả kiểm tra độc lập. Code này cũng đi, mà đi đầu thai. Chỉ hai trong số tám tính năng làm tối qua cần coi lại. Mới mấy tháng trước, lúc còn đang probation, ngày nào báo cáo của Phát chỉ có màu của máu. Hàng triệu token hóa hư vô. Nguyên ngày làm việc hóa công cốc.

Sáng nay lại là code frontend bị sai. Không lạ. Mỗi mẻ code, Phát làm vừa backend lẫn frontend - AI chạy tốt nhất khi nó được giao trọn vẹn một tính năng. Đây cũng là cấu hình tốt nhất cho những hệ thống kiểm tra độc lập, có những vấn đề chỉ xuất hiện khi ghép hai nửa tưởng-là-đúng lại với nhau. Hồi trước, backend là lá ngọc cành vàng, sương sa hạt lựu, nâng trứng hứng hoa. Bây giờ điều này chỉ đúng cho phần lõi của những hệ thống đã đi vào ổn định. Khi sản phẩm vẫn đang tìm chỗ ngứa, ăn thua nhau giờ là ở frontend. Code backend nặng về logic khá phù hợp với tư duy của AI, hoặc có viết khác đi bình thường mà kết quả vẫn đúng thì cũng không mấy ai quan tâm. Frontend là cái mặt tiền, không thuận một cái là gai con mắt liền, mà trong vô hạn cách các nút ấn, chữ, và hình kết hợp với nhau, mô hình AI mới nhất nhiều khi cũng... chịu. Ta thường ngộ nhận AI làm đúng một việc đồng nghĩa với việc nó có tư duy, và chỉ nhớ về bản chất LLM là cái lò xổ số vào những lúc này.

Con bug thứ nhất... không phải nhóm này. Vụ khác. Làm translation lần đầu để trang web Anh-Việt đều dùng được. AI dịch rất mượt, Language Model mà, thay luôn mấy anh freelancer trên Upwork. Nhưng dịch một câu và dịch một hệ thống web là hai chuyện khác nhau. Cả tối qua, bào hết mấy lần context window mà vẫn chỗ thiếu, chỗ thừa, chỗ còn nguyên chuỗi thô notifications.task_assigned_to_client chằm dằm. Quét toàn bộ mã nguồn, từ giao diện, mã lỗi, đến email tuần tự, chỉn chu tưởng là phù hợp với AI nhưng hóa ra còn khó nhai với bộ nhớ tiêu chuẩn lắm, dù từ đầu năm nay đã tăng lên gấp đôi. Phát xóa code cũ đi - làm với AI xây mới nhiều khi dễ hơn sửa lại, bởi mới gọi là đi đầu thai dễ hơn tu thành chánh quả - rồi viết tay một cái plan mới, khoanh lại từng vùng cần dịch, hướng dẫn AI chia nhỏ việc để không tràn bộ nhớ. Xong xuôi.

Ngó tới con bug số hai, Phát nhăn, nó nghĩ, con bug này dễ quá mà sao vẫn làm sai?! Một trường dữ liệu mới được thêm vào giao diện nhưng không được lưu xuống database. Quá đơn giản, sửa cũng rất nhanh. Nhưng đội AI của Phát có 4 "chú", uống token như Humvee uống xăng bên Iran, mà vẫn bị những lỗi nghé ọ như này thì không đáng. Đội AI của Phát tốn token hơn nhiều một câu lệnh độc lập, nhưng bù lại tụi nó tự lên kế hoạch, viết code, rồi kiểm tra chéo lẫn nhau. Một hệ thống như vậy, trên lý thuyết, có thể chạy cả ngày mà không cần con người giám sát. Mục tiêu không phải là làm nhanh hay chậm. Mục tiêu là AI tự bơi - nhiều nhất có thể. Một khi AI tự bơi thành công, nó trở thành một cỗ máy chạy ngày chạy đêm, chạy đa luồng, chạy không quan tâm tới Peta (People for the Ethical Treatment of Agents), tốc độ là một hệ quả tự nhiên. Không nhảy vào sửa trực tiếp, Phát phải truy cho ra tại sao các chú AI bỏ qua con bug abc này. Thói quen của Phát là sửa code, giờ phải sửa AI để sửa code, nó thấy như ăn phở mà không có rau sống, nhưng anh lead dặn hoài.

Trưa. Phát nghĩ là đã tìm ra lỗ hổng trong logic của AI làm nó dừng code khi kiểm tra chưa xong - mẹ đẻ con bug số hai. Nghĩ tới cuốc bộ hai cái ngã tư trong cái lò bê tông Sài thành mà hết muốn ăn. May quá, công ty có canteen. Đường đến canteen ngang qua phòng máy. Dàn quạt làm mát chạy êm, chỉ nghe ì ì. Chắc tại còn mới, ai biết ba hồi nó kêu chả khác gì dàn máy lạnh, cùng một thứ mà. Phòng máy này là một thử nghiệm. Một cái máy đẻ code chạy ngày đêm đốt token như Snoop Dogg đốt cỏ - liên tục và không dừng lại. Model càng thông minh càng đốt token. Ngành này chưa ai tìm ra cách làm AI thông minh hơn mà ít tốn token. Chỉ dùng mỗi model thông minh nhất thì lương kỹ sư sợ cũng theo không kịp. Model nguồn mở thua kém model độc quyền tầm 6 tháng 1 năm, nhưng chỉ tốn tiền phần cứng và tiền điện. Liệu có thể tạo ra kết quả tốt tương đương model đắt tiền bằng cách cho model rẻ chạy nhiều lần hơn - đây là thử nghiệm kéo giá RAM lên tới nóc. Cơm xong, trên đường chích caffein Phát thấy mấy anh DevOps lui cui lôi máy ra ngoài, nghe bảo là để thêm RAM. Lời nhắc xem công ty có thanh lý phần cứng không đã được ghi nhận.

Team Phát năm người họp sau giờ nghỉ trưa. Mọi người báo cáo kết quả tối qua, Lần đầu tiên có một mẻ đạt 100%. Xanh, mơ một giấc mơ màu xanh 🎶. Phát nói về con bug số hai và bà mẹ của nó (con bug, không phải Phát). Người khác nhắc đã tới lúc cập nhật lại file AGENTS.md của hơn chục repo sau hơn hai tuần. Anh Tech Lead so sánh kết quả thử nghiệm đầu tiên giữa model độc quyền và model nguồn mở của một team khác. Phía QA chỉ ra vài vấn đề hạn chế sự tự chủ của AI mỗi khi liên quan đến repo Task Assignment - tới lúc phải dọn lại. Cả nhóm xúm nhau giáo dục lại AI, nhồi thêm 50 dòng prompt vào bộ mẫu. AI sẽ phải kiểm tra nhiều hơn, mỗi đầu việc chắc sẽ chạy chậm hơn 10-20 phút, nhưng nếu tỉ lệ thành công tăng 10%, đây là một cái giá nhỏ.

Nửa sau, anh em chia nhau công việc cho mẻ code đêm nay. Tính năng mới, cập nhật tính năng cũ, sửa lỗi, và đủ loại thập cẩm. Không như con người muốn làm những việc liền mạch nhau, AI không quan tâm. Nó có thể có một context window nhỏ, nhưng không hề có chi phí context switching. Thay vào đó, mọi người dành thời gian để bàn AI đã có đủ context chưa và làm sao check var khi nó làm xong. Lòi ra hai việc mà anh Tech Lead nghĩ AI chưa sẵn sàng tự bơi. Phát được giao một trong hai. Nó không phiền, còn thấy mừng. Quay lại với code sau cả ngày viết prompt là cơ hội chuyển cảnh cho đầu óc được tươi mới. Cũng là cơ hội đánh giá hệ thống AI xây có một cái nền tốt hay lỏng lẻo - chạm vào là sập. Lỡ mà trúng vế sau, sắp tới chắc mệt, nhưng biết sớm còn hơn không biết. Quan trọng hơn, Phát không muốn giao AI những việc mà tự nó không biết làm.

Họp xong cũng hơn 3 giờ chiều. Ngày nào cũng họp như vậy. Phát không biết cuộc họp gọi là gì, từ lúc Phát vào công ty, mọi người đã họp đều đều như vậy. Nó là buổi retrospective, sharing, họp đầu tuần - giao ban (iteration planning meeting) và vân vân cuộn lại làm một. Những cuộc họp này từng riêng lẻ và diễn ra hàng tuần, hàng tháng. Nhưng thời buổi tính năng mới bị sao chép trong vài ngày và mỗi mẻ code toàn màu đỏ là một ngày không lấy lại được, y như những ngày nó phải học TTHCM, hàng ngày là tần suất cần thiết để giữ cỗ máy code chạy hiệu quả.

Sau giải lao tám xàm với anh mấy project manager còn đang lan man định nghĩa thế giới hậu Agile, Phát quay lại mặt đất. Nó cần chuẩn bị cho ca buổi tối. Phát nhận 9 đầu việc. Chi tiết từng việc, việc nào làm trước, việc nào làm sau, cái nào chặn cái nào, đều được ghi lại trong một cái sớ dài thòng. Kèm với bản spec và kiến trúc tổng, AI cần có đủ context để không phải đoán mò, chỉ tập trung vào code. 8 việc đầu Phát đọc kỹ như hợp đồng vay ngân hàng và sửa lại những phần có thể gây nhầm lẫn. Làm với AI giờ không khác làm phần cứng, khuôn mà sai thì thành phẩm không thể chuẩn. Khi đầu óc tù mù trong đống specs, Phát coi vào việc số 9 - là việc có lẽ AI chưa sẵn sàng tự bơi. Không còn làm người giám sát, nó "ngồi xuống" cầm tay AI chỉ việc. Phát viết trước vài đoạn code mẫu. AI viết đoạn tiếp theo. Phát đọc lại, rồi viết phần tiếp theo. Hai bên ping pong qua lại một lúc, code đã được hòm hòm. Đó trờ thành văn mẫu cho AI để đêm nay làm tiếp các phần tiếp theo.

Thoáng đã tới 5 giờ. Ngày dài, ngoài trời nắng cũng dịu lại. Nếu Phát ở ngoài trời, chắc nó sẽ nói gió từ sông đang thổi man mát, nhưng nó ở trong lồng kính máy lạnh chạy phà phà sáng giờ. Làm việc cả một ngày là để chuẩn bị cho lúc này. Phát bấm Enter, lần lượt prompt, specs, design, và hầm bà lằng các tài liệu khác, được tải lên cho AI. Từ chỗ ngồi, Phát nghe thấy tiếng quạt từ phòng máy lên thêm một số, chạy ro ro. Không chỉ có Phát, nhiều người khác cũng bắt đầu ra lệnh cho đội AI của mình. Người đã hết xí quách, Phát chia hai màn hình, một bên là màn hình terminal AI đang chạy Optimus 6.2, một bên là bài báo thông cáo Optimus 7 vừa được giới thiệu. Việc ngày hôm nay AI làm không được, ngày mai có khi sẽ thành công. Việc đầu tiên đã code và kiểm tra xong. Vẫn xanh. 

5:59 Phát ra về. Phòng máy vẫn chạy đều đều. Đội AI tiếp tục đi từ việc này qua việc khác. Sáng mai, Phát sẽ nhận kết quả của mẻ code này. Xanh hay đỏ bây giờ là việc của AI, cuộc sống của anh thanh niên Phát giờ mới bắt đầu. Ngoài trời đúng thật gió đang thổi man mát.

Saigon, 22.04.2028

---

Bài viết là một lát cắt hư cấu về viễn cảnh công nghệ năm 2028. Mọi sự trùng hợp với người thật hay tổ chức ngoài đời thực đều là ngẫu nhiên.

English version is available at my Substack.

Tuesday, February 17, 2026

The future is weird

I am not in the business of future prediction, so this is not one of those the-singular-is-near kind of things. The future written here might not be very far either, like a few months or a couple of years max down the road. I guess what I am trying to say is that this might not be a very good prediction. That is a fair warning from your truly.

If you are still here, then strap in!


The future is waterfall

I am quite invested in vibe coding. I have had the whole workflow figured out - until the next LLM release screws it up, story of another time. From requirements to technical design, I basically end up with fat Jira tickets. They are longer than anything I write for myself, not that I am a prolific Jira user to begin with. The idea is that, between the ticket and the codebase, LLM has all the context it needs to finish the work on its own.

The one thing this workflow handles horribly is requirement changes. If the change is non-trivial, I would have to update a whole bunch of artifacts. Yes, I can accommodate the changes. But I have to do it. How one change affects the project is not something I can hand over to LLM in a single prompt yet. So it is just annoying.

I also notice that the current generation of LLM isn’t exactly built for interactive work like a true pair programmer. I have had many successes revising implementation plans with LLM, but far fewer stories revising the code. Between revising code in interactions and changing the plan, giving LLM a fresh start, many times the former seems like just sunk cost fallacy. LLM either gets it right the first time or it doesn’t.

I find vibe code rewards big planning, so… my workflow has increasingly been more Waterfall. I was taught Waterfall was bad, and I should feel bad. I think Waterfall got its bad rap from its slow feedback time. But LLM is much, much faster at producing code. If I chunk the scope of work well, I usually have stuff to show in a couple of days. Agile isn’t exactly holding its candle well either. I like to think RAPID will be the next methodology ;)

The future calls for multi-agent orchestration

I don’t usually vibe code with just one LLM instance. Staring at the screen while the agent does its thinking and chunking was mesmerizing the first time I figured out the secret sauce. After that, it is just meh. I tend to have 2-3 instances run in parallel instead. Token!

That is about as much parallelism as my brain can handle though. Beyond that, it is torturous. A stronger human might say that is weak, to which I respond: you are absolutely right! But as long as one has to manhandle the instances, there is a physical limit to how this gonna scale. And while I get to shut down my laptop at 6 PM, who on God’s green earth gives these LLM instances the right to rest?! There is no Peta for bot yet!

2026 started with a Gas Town-shaped bang. Steve Yegge is known for his rants. His writing can’t be described as well-structured thoughts, but it’s entertaining. What else can one ask for?

I don’t buy the messiness of Gas Town, I hope the future is cleaner and has better token efficiency. But I love the analogy that Gas Town to LLM is like k8s to Docker.

In my sweet dream, months from now, I will spend my afternoon planning. At 5 PM, I hit a big red button that says “Start” - it is very important for my well-being that the button is red and big. An engine starts to hum and produce code. I watch the metrics for an hour before calling it a day. The engine purrs through the night. The next morning, I come to the office and realize that the whole night of work, $2000 worth of tokens, has produced complete garbage. Marvelous!

The future rewards M-shaped people

I wrote about this before. And it is not the future, it is now. This point just goes really well with multi-agent orchestration. To operate such a (costly) engine, you cannot afford to say “I don’t know what it is producing, it passes the tests, so it is good enough for me”. I mean, technically you can, but I am afraid there are people better qualified for your job then, because they can judge and calibrate the outcome.

I had this back in 2025 and I think it is still correct
This is not my original idea - the concept is widespread on the internet. The traditional model has been the T-shaped professional: the vertical bar represents depth of related skills and expertise in a single field, whereas the horizontal bar is the ability to collaborate across disciplines. In software development, this meant being, say, a backend engineer who understands enough frontend and DevOps to collaborate effectively.

But LLM doesn’t just allow us to do things better. Contrary to the popular belief that AI accelerates brain rot, I find that motivated people learn faster with AI support. The other day, my staff engineer gave Claude Code access to the Postgres source code and proceeded to drill down some very technical questions that otherwise would be impossible for us to have that expertise in a short amount of time. LLM gives us access to the consultancy we didn’t have before.

Instead of knowing one thing really deeply (the hallmark of individual contributors in the past), it allows us to know many things deeply, hence the M-shaped analogy (m - lowercase - would have been better, I was clueless what to take of the capital M initially). This shift is profound for career development. The traditional advice of “specialize or generalize” is becoming obsolete. The future of career advancement lies in being able to connect multiple domains of deep expertise.

 

The future cares more about latency than elegance

I used to agonize over code. Variable names. Function boundaries. The perfect abstraction. I was taught that good code is elegant code, and elegant code is maintainable code. The logic was sound: you write code once, you read it a hundred times, so optimize for reading.

But what if you write code once and never read it?

Vibe coding can be quite wasteful. I am encouraged not to bother with the perfect abstraction, just focus on the problem at hand. A script to migrate some data. A one-off analysis. A prototype to test an idea. LLM generates it. I run it. I delete it. If I need it again, I regenerate it. The regeneration takes two minutes. The careful crafting would have taken two hours. It is a future of quantity over quality. Many people I know would hate this. I hate it too, reading Zen and the Art of Motorcycle Maintenance in my formative years. But I am afraid this is happening regardless of my preference.

This changes what “good” means. Good code used to mean elegant code. Now good code means fast-to-produce code. Latency is the new elegance. The question isn’t “is this beautiful?” but “how quickly can I get something that works?” 


The future doesn’t look kindly on some best practices

This realization sent me down a rabbit hole. How many best practices are solutions to problems that AI simply doesn’t have?

Take “keep functions short.” I was taught 20 lines max. The reasoning: humans have limited working memory. We can’t hold a 500-line function in our heads. So we break things into digestible chunks. But Claude processes massive functions without breaking a sweat. If anything, too many tiny abstractions make it harder for Claude to follow the flow. The function length rule was never about the code. It was about the human brain.

Comments and static documentation that drift from reality within a week? While they might be useful in some specific situation, like the user should not have access to the source code, but if people have access? LLM can just read the actual code and tell me what it does now, not what someone wrote it did six months ago.

Story point estimation? When Claude can prototype something in twenty minutes, “how many points is this?” becomes “let me just try it and see.” The estimation ritual was about managing uncertainty over weeks of human effort. The uncertainty shrinks when the effort shrinks.

Not all practices are heading for the graveyard though. Some survive, but for different reasons. DRY used to exist because humans forget to update all instances when logic changes. Copy-paste five times, change it in four places, spend three hours debugging the fifth. Classic. AI doesn’t have this problem. It can regenerate boilerplate on demand without breaking a sweat.

But DRY still matters. Less code means more context fits in the LLM’s context window. Every duplicated block is tokens that could have been spent helping Claude understand the rest of your codebase. The practice survived, but the why behind it shifted completely.

I have been coding professionally for 15 years now. I tend to look at myself as an old guard of (some) old values. I groan at kids using Python shell to query instead of raw SQL. I feel like in the coming time, dogmatic vs pragmatic will give me some serious cognitive dissonance.


The future cares about audit, not review

If humans aren’t writing the code, and humans aren’t reading the code, how do we know the code is any good? The answer used to be code review. But when LLM generates a 500-line PR in ten minutes, and the 3 previous PRs were reasonably good…my attention drifts.

I find myself caring more about things I can verify without reading every line. The obvious stuff: does it have tests? Do the tests actually test behavior, not just chase coverage? Can I trace requirements to implementation without understanding the implementation itself?

But increasingly I care about the generation process, not just the output. What was the plan before the code was generated? What context did the agent have access to? Enough context? Too little? Too much noise? Did it have the right files in its window, or was it hallucinating an API that doesn’t exist? Did it follow the technical design, or did it improvise?

These questions feel strange. I’m not reviewing the code. I’m reviewing the conditions under which the code was born. It’s like judging a student’s exam by checking whether they had the right textbook open, not by grading their answers.

The future looks like layers of automated verification with humans doing spot checks. AI writes code. Different AI reviews code. Static analysis runs. Humans audit the audit - sampling PRs to calibrate confidence, checking that the verification system itself is trustworthy.

Artisanal code review is declining. Not because the wisdom doesn’t matter, but because it is not always required.

---

Tldr; The future is fat Jira tickets, multi-agent orchestration humming through the night. It needs M-shaped people who can man the beast - with enough depth across domains to know when the machine is producing gold versus garbage.

Thursday, December 25, 2025

2025 was a blast

Although it may not feel like it now, a few years from now, 2025 will likely appear as a pivotal moment in my life. I will remember this year with much fondness and an awareness that after this, things won't be back to where they were again.

10 years at Parcel Perform

I crossed this mark in October. 

Yep, I have been around for 10 years. A lot has changed in that timespan. I was 25 then, thinking that this would only be a short gig - startup projects come and go like spring flowers. The social cycles around me changed - I owe to my friends, old and new, the experience of an amazing burst of youth, excitement, and an unfounded sense of invincibility. I used to spend half of my week at the office, no, literally, I slept there. Since then, I have moved office a few times, and the latest edition, let's say, isn't exactly habitable. 

Throughout all of those, my employment at Parcel Perform remains constant. It isn't necessarily a good thing. At the end of last year, I genuinely thought my work at Parcel Perform was done. Like, we had a blueprint for the ever-growing traffic. Business had recovered its footing. And somehow, I felt like going on a loop.

Then AI came and flipped everything upside down. Software, as I know it, is changing. And with that, everything is new and existing again. I don't know how long this will last. I would give it a couple of years to see it through. To see if the system, people and machine that I have grown to know so well, would hold up and thrive in the market fundamental shift. Regardless of the result, I will come out with good stories. And that's what I have been after in my career.

Sabbatical leave

I was on the verge of boredom when I entered the sabbatical. The idea was to give myself as much time as possible to recover from what seemed like an over-lasting endurance race to keep things afloat since COVID. Well, "as much time as possible" came down to 5 months, after some negotiation. I should have gotten back in time to prepare for BFCM. Sounded good.

I had a few things I wanted to do during this break. I'm gonna read more. I'm gonna pick up my writing again. And I'm gonna teach myself some robotics. Generic new-year resolution bullshit, I know. I have only done this twice. The world found that unagreeable. In the first 2 weeks of my break, there were Google IO, Microsoft Build, LangChain Interrupt, and my personal fav, Code with Claude.

It was a bad time to lie low. So I didn't. I spent an awful amount of time on the internet soaking up all the news, tutorials, and whatnot that came my way. I was a Slack, a critical mistake of mine. I was never reall off work.

Don't get me wrong. I enjoyed learning and did all the above voluntarily. But it didn't feel like a break. More like a research semester at school. The mind was always working on the next puzzle, just that time flew differently. I wrote quite steadily during this time though. 


The pivot point for me came when I internalized that successful AI adoption was not simply a thing that I did once and was done. Good AI output relied on AI-friendly input. One couldn't spend their day translating input for AI, it was just counterproductive. The change had to come from upstream. And upstream's upstream. Before long, the end-to-end process needed to change. And my company at the time didn't have this level of thoroughness. 

I put on my writing hat, penned an Engineering AI Transformation doc, and planned my early comeback. I was only 2.5 months into my supposedly 5-month break. God damn it. Did I say I picked the worst possible time to be on break?

Dance in an AI whirlwind 

End-to-end realization and an AI transformation plan were just the beginning. Convincing 100+ people spanning 3 different continents wasn't a walk in the park, especially when I was equipped with no more than 3 months' worth of YouTube and hopeless optimism.

We started testing new hires for AI usage skills around the time my break began. By the time I was back, the test we spent weeks creating had become too easy for the latest models. Ouch. That pretty much summed up my experience in this race. Every step we gained was just temporary. Any trick I learned was either superstition or served me for weeks or months, eventually being standardized by a release from one of those frontier model labs - balancing the playing field.  I was only better than the next guy in those fleeting moments.

The amount of attention going to AI and the progress coming out of that is breathtaking. The world is still working on this fundamental transition.
  • We need to learn how to incorporate and compensate people who move on from T-shaped to M-shaped model
  • We can vibe code many things, should we? Build vs buy decisions can make or break a business. It all comes down to opportunity cost. How do you even measure opportunity though?
  • Agile is dead, way before AI nailed its coffin. What's now, creative chaos (i.e. fancy name for a mess)?
  • What can be done to ensure people going through this storm are energized, enthusiastic, and curious? Because it's not gonna be over soon.

I am getting married

By the time you read this blog, perhaps I will have done at least one of my three banquets. Yeah you read it right. Three. There is one on the bride's side. Another on the groom's side - which happens to be my side, in case you are uncertain. We call those our parents' wedding. Really, the wedding invite was written from the parents' POV, guests are basically invited to their son and daughter wedding ;)

Then there is a teeny tiny destination wedding. That is our wedding. The wedding. It has been the one constant item in my mind for a whole year. I am beyond thrilled to see it through. Vy has been telling everyone I am a guest at my wedding, because she had to do all the heavy lifting. She is absolutely right! I can offer little help because I think the first image of a wedding bouquet Pinterest shows me looks very nice, and I don't have the willpower to go through the very end of Pinterest's database to confirm that. 

Oddly enough though, I find all these traditions enjoyable. In a world spinning around AI 24/7, it is nice to see something never changes. Like love. Or family. I am a lucky guy.



Tuesday, September 16, 2025

7 Lessons from Building an AI-First Organization

 

1. Coding is not the bottleneck

Nor product inception is the bottleneck.

Nor testing is the bottleneck.

When picturing a function adopting AI tools, we tend to think of using LLMs to automate the one action associated with that function - be it a developer, product manager, or QA. The common, and perhaps intuitive, thought is that because the action is the one thing the function is known for, if we can just offload all the heavy lifting to AI, we would have achieved the goal of revolutionizing the field.

But developers spend less than 25% of their time coding. AI-assisted coding generally provides a 10-30% productivity gain. That is, at best, a 30% gain of 25%. Even if AI can eliminate manual coding completely, it amounts to around 25% gain. More importantly, this 25% is the reason many developers choose this career, myself included. We are intrigued by solving problems, building things, or the act of writing code itself. Eliminating this is to eliminate our job satisfaction - the dystopia we don't want to live in. Replace developer with product manager, replace writing code with writing specs, and we still get the same picture.

The real friction is in the other 75% of the time. In this slice of time, we find ourselves clarifying requirements, providing customer support, digging through legacy code to decrypt logic nobody remembers, or worst, bored to death in meetings. It is these activities where we find the goldilocks: massive productivity gain, improved job satisfaction, and low-hanging fruit. I didn't just add the last part because every great argument needs 3 supporting points. Creating a knowledge base from which feature details can be queried conversationally is a lot easier than getting LLM to generate production-ready code on its own.

2. AI adoption has to be end-to-end or else it is pointless

This draws heavily from the manufacturing chain analogy. In such a chain, we cannot just increase the speed of one part and expect an overall gain from the chain. Such a system moves at the speed of its slowest component. Having just one component moving faster than others actually creates misalignments and can be harmful to the whole system.

Same in a software development process. If code is being written faster than product specs can be written or features can be tested, there could be 2 outcomes: the code sits around generating no revenue while getting obsolete by days as technology moves on; or other functions have to rush and compromise quality.

AI adoption fundamentally rocks the norm of many functions, if not all, but we don't have any option other than to embrace it thoroughly. Product needs AI to help structure requirements. QA needs AI to automate test generation. DevOps needs AI to predict incidents. Customer support needs AI to surface documentation. Every function needs to level up together, or the whole thing falls apart. Half-measures don't just fail to deliver value - they actively create misalignments and chaos.

3. Career development is going from T-shaped to M-shaped

This is not my original idea - the concept is widespread on the internet. The traditional model has been the T-shaped professional: the vertical bar represents depth of related skills and expertise in a single field, whereas the horizontal bar is the ability to collaborate across disciplines. In software development, this meant being, say, a backend engineer who understands enough frontend and DevOps to collaborate effectively.

But LLM doesn't just allow us to do things better. Contrary to the popular belief that AI accelerates brain rot, I find that motivated people learn faster with AI support. The other day, my staff engineer gave Claude Code access to the Postgres source code and proceeded to drill down some very technical questions that otherwise would be impossible for us to have that expertise in a short amount of time. LLM gives us access to the consultancy we didn't have before.

Instead of knowing one thing really deeply (the hallmark of individual contributors in the past), it allows us to know many things deeply, hence the M-shaped analogy (m - lowercase - would have been better, I was clueless what to take of the capital M initially). This shift is profound for career development. The traditional advice of "specialize or generalize" is becoming obsolete. The future of career advancement lies in being able to connect multiple domains of deep expertise.

4. AI adoption leads to change(s) in team structure

There is a discrepancy in AI's impact on productivity between functions. It could be from the nature of work - some functions, like security, are harder to automate than UI test execution. It could be because at that moment, it is where the focus of the industry is, like the investment in application code generation far outweighing infrastructure code generation (which already suffers from a smaller training data set to begin with). And sometimes, we need a strong human-in-the-loop element. Take product managers for example - sure, AI generates product specs really really fast. But disastrous specs will throw a team off its track and cost a company opportunities it cannot get back.

That is to say, right now, it seems to be easier to automate code production compared to other functions. The traditional ratio of 1 PM to 5-8 engineers to 2-3 QAs is becoming obsolete. Where PMs still take two weeks to write specs and QAs cannot click through test cases faster, a productivity gain as small as 30% from developer breaks down the balance.

As such, I think we would see some variations from the current team structure to maintain balance between functions. Primarily, a team can have more product managers, more QA to keep up, or fewer developers. My money is on fewer developers. See the lesson above.

5. Productivity measurement becomes important

Measuring productivity has always been a controversial topic, especially in software development where the delivery is not as tangible as, say, a manufacturing process. Personally, I am not a big fan. It is a hard topic and I don't get much fun out of it. Plus, I have always identified myself as an engineer, the subject of productivity measurement, and I don't like the idea that my contribution to my organization can be boiled down to a set of numbers. If that day comes, by the way, I hope I am a solid 8.

But even with my prejudice, I can't neglect that for a company as small as mine, we might be paying tens of thousands of dollars every month for computer-generated tokens. It is a large sum of capital, capital that can be invested elsewhere. Nobody gets good on the first try - actually most get slower when they try to do something they have done since forever but with new tools. Productivity dip is an important, well-understood and well-accepted part of any learning journey. But said journey can only go so far before the ROI needs to be calculated.

Soon managers will need to choose between a new hire and a new AI tool. The math isn't straightforward. A new engineer costs $X annually but brings human judgment and creativity. An AI tool might cost $Y in tokens but needs constant supervision. Which delivers more value? Without proper productivity metrics, we're making these decisions blind.

I hope by then, we have known about productivity enough to make a well-informed decision, not some dogmatic principles (neither human is unique, nor machine is faultless). Cynical as I am, I also know that it is wishful thinking - we'll probably still be arguing about story points while the AI quietly rewrites our entire codebase.

6. Bottom-up innovation triumphs over top-down dictation

A recent MIT report found that 95% of generative AI pilots at companies are failing. A pattern emerges from the report: top-down "enterprise" pilots mostly go nowhere, while bottom-up adoption is what actually drives disruption.

The problem with top-down initiatives is that upper management usually works completely differently from the majority of the workforce - the frontline workers - in terms of requirements and daily tasks. They end up building things that nobody needs, optimizing activities with marginal ROI, and eliminating work people love (see lesson 1). Meanwhile, individual employees are finding real value by experimenting with frontier models on their own terms, for their specific needs.

The 5% that succeed? Those are likely the ones where companies recognized this organic usage, which the report calls "the future of enterprise AI adoption", and supported it rather than fighting it. Bottom-up innovation triumphs over top-down dictation. The reward is for those who can get their hands dirty.

7. AI adoption is irreversible despite reality checks

Despite occasional setbacks, AI adoption in the industry is irreversible. Just like once color TV was a thing, nobody wanted black and white. I am not giving up my agents. Yes, they will replace me some days, but today they contribute to parts of my job satisfaction.

It only makes sense that AI skills - the correct way of using AI be it technical, intellectual, or ethical - need to be learned and tested. This is already happening. Meta is letting job candidates use AI during coding tests. They're acknowledging that AI is now part of the toolkit, just like IDEs and Stack Overflow before it. Testing someone's coding ability without AI is like testing their math skills without a calculator - technically possible but practically irrelevant.

I have learned the hard way that I should never just ask if someone "uses AI." The answer is not binary, yes or no. Everyone says yes these days. But only upon close inspection, the answer reveals itself to be a spectrum. It goes from "I ask AI questions so I don't have to Google myself" to "AI is my copilot" to "I have delegated all thinking to AI." The difference between these levels is massive - it's the difference between using a tool and being used by it.

Soon, we will see the AI-focused version of today's LeetCode. Instead of testing a red-black tree from memory (what is it by the way?), we will be tested on whether we can architect a system with AI assistance, validate AI-generated code for subtle bugs, or construct prompts that consistently produce production-ready outputs. The skill isn't memorizing algorithms anymore - it's orchestrating AI to solve real problems while maintaining quality and understanding.

I think this is when people say the AI genie is out of the bottle.