Saturday, June 18, 2022

Prioritizing development decisions


Great startup stories tend to share the same mold: a great founder dreamed of an equally great vision and followed it fearlessly till the world was conquered. History is written by the victors, of course, but it is relatively common for people to have a rough idea of what they want to build before starting the work. That’s the easy part.

But constructing a concrete step-by-step plan to deliver not even the vision but a mere release is hard work. A good plan needs to take advantage of both business and development expertise without letting one overpowers the other. If the business makes all the calls, the development time might be painfully long and the product crashes when traffic starts to peak. If the development decides, we might have a technical wet dream of solving a non-existent problem. That’s where the planning game comes in.

I can’t decide if a game or a dance is a better metaphor for depicting the collaborative nature of planning. Business and development, each possesses knowledge unavailable to the other and is unable to produce the entire plan. The work can only be done by combining the strength of both sides. In economics, a “game” refers to a situation where players take their own actions but the payoff depends on the actions of all players. Game Theory suddenly sounds less conspiratorially adventurous, doesn’t it? Dance on the other hand isn’t used as much in research literature so I ended up siding with the economists. That’s a sidetrack.

In an Agile team, a planning game looks like this:

  1. The Product Owner decides the scope of the plan. Based on the purposes of the projects, the Product Owner prepares a set of use cases and explains why they are valuable problems to solve and why they should be done first.
  2. The whole team breaks each use case down into stories. The idea is usually that anything requiring the team to do something other than normal company overhead needs a story.
  3. The developers “size” the stories. They estimate the time each would take or its complexity. And then group stories that are too small, split ones too big, and decide what to do with stories they can’t estimate.
  4. The Product Owner prioritizes the stories. Some stories won’t be worth adding, either unimportant or too far in the future.
There are many things that can be said about the planning game, from Work In Progress should be minimized, the releases should be small and often, to the best answers for “why does it cost so much?”. Those are stories for another time.

In this piece, I want to discuss specific friction in step 4 of the game where one story is prioritized over another. The stories laid out in step 2 do not necessarily project the same values to different team members. Some are pretty straightforward, implement feature X to earn Y money, the contract was signed. While others are more tricky such as implementing plug-and-play UI components so that future web pages are built faster. The second category usually comes from the development team who is one layer away from the users and so perceives values differently from the Product Owner. That is the breeding ground for misalignment.

Product Owners want to release a solid, usable product. They also have to balance that with the desire to save money and meet market windows. As a result, they sometimes ask developers to skip important technical work. They do so because they aren’t aware of the nuances of development trade-offs in the same way the developers are.

Some developers note down all the development options like a shopping list, “outsource” Product Owners to choose, and then roll in agony at the wrong decisions. If such a strategy didn’t work for the guys at the Pentagon, it wouldn’t work anywhere. Just as Product Owners are the most qualified to decide the product direction, developers are the most qualified to make decisions on development issues. Don’t delegate the decision, take the matter into your own hand. If a development decision isn’t optional then it shouldn’t be prioritized either. Just do it.

Instead of:
Our notifications is crucial at informing customers the health of their business. To make the data pipeline behave transactionally, we have several options. Please let me know how should we prioritize them.

    • Experiment with Flink’s TwoPhaseCommit, this is new to us so it would take time and be hard to estimate.
    • Get Sentry to cover all the projects, this is a passive measure as we passively wait for exceptions.
    • Add a check at the end of the pipeline to make sure no duplicated notifications are generated, the check will have to handle its own state.
    • Move the final stage of the pipeline to Django, it is a web framework that supports transactional requests by default and we are familiar with it.

Try this:
Our notifications is crucial at informing customers the health of their business. The data pipeline is long and consists of multiple nodes, each needs to successfully finish its work to produce a notification. To achieve this notion of exactly-once delivery, we need the pipeline to behave transactionally and every exception to surface swiftly. That is done via Flink’s TwoPhaseCommit and Sentry integration. The work will be done at the beginning of the project as it is easier to handle when the code base is still small. TwoPhaseCommit in particular is new to us so we will have a couple of spike stories to understand the technology.


When there is a business choice to be made, don’t ask Product Owners to choose between technical options. Instead, interpret the technology and describe the options in terms of business impact. To continue our notification example, before any notification is sent, there is a need to make sure the data we have is the latest. The conversation can go like this:

We are thinking about adding another Kafka queue to request the latest data. We then need to join the request flow with the future trigger with some sort of sliding window, will also need to thinking about out of bound data. Our other option is to set not one but two future triggers so that one can request data and the other handles notifications. Which would you prefer?

Try this instead:

We have two choices for ensuring a notification always works on the latest data. We can use a deterministic approach or an empirical approach. The deterministic approach would add a new data request flow right before the notification is sent. The notification is processed after the data request flow so we always sure the latest data is used. But because technically data procession and future notifications are asynchronously independent from each other, it would require several more stories for us to join them together. The empirical approach won’t take any extra work. We observe that it usually takes less than 5 minutes for a data request, so we can set two future triggers instead of one, 10 minutes apart from each other. The first one request data, the second notification. But the margin of error is larger because sometime there can be delay in data request. Which would you prefer?


And finally, no software engineering discussion would be completed without a talk about code refactoring. In the context of the planning game, it is mostly about justifying the refactoring effort. While it is tempting to do a “spring cleaning” hoping to refactor the whole thing back into shape, the sad truth is halting the development of working software for refactoring is hardly justifiable. Refactoring effort deals with risk (the old code can implode at any time) and potential (the new code is easier to work on). Those values are intangible compared to the usual subjects of a business decision (new features lead to a new set of customers lead to greater revenue).

What do we do? Boy scout rule “always leave the campground cleaner than you found it.” Whenever you need to implement a new feature or fix a bug you see if that part needs improvement. Refactoring shouldn’t be a separate phase, it is part of everyday development. Once you nurture this culture of quality, there is nothing to justify.

None of the above suggests the easiest way to avoid friction is to keep the business side in the dark while going on waving the engineer's magic wand. Communication remains the key to any successful project. There is more to a project's success than just business decisions, and working out a way to be a (constructive) part of the conversation is more powerful than a baseless delegation.

Monday, January 3, 2022

What makes an internship meaningful?

I got 2 kick-starts in my career, a part-time job in 2009, and an overseas internship in 2010. I dropped a production database on the first day into the former, and was so ill-prepared in the latter I didn’t look up the place’s climate and local language till I was already on the plane. Both were privileges. They catapulted me into worlds I couldn’t fathom and shaped my career in the following decade. In turn, as I have built startups, I have also been on and off in building internship programs. My number one concern is to replicate the magics I got to experience for the next generation. I am writing this for you who are seeking an internship opportunity.

What makes an internship meaningful? I ask this a lot, and the most common answer I receive is “a new experience outside of school”. That’s right. Much of what we do in life is to propel ourselves into new experiences. As a species, we are a bit of an adrenaline junkie (and serotonin, caffeine, and cocaine). But it is not particularly good at answering the question.

A new experience is undoubtedly crucial for an internship, but if it is solely what makes an internship meaningful, wouldn’t picking up something entirely opposite to your education offer the most exposure? I am a software engineer by train, but the course that awed me the most in college was a general elective, macroeconomy. It expanded my horizon, but I wouldn’t have considered an internship in a central bank, it would have ventured too far away from what I wanted to do. An internship should represent a certain level of relevancy to your long-term career.

An internship is only as meaningful as the mentorship it can provide. The transition to the workforce is the transition from theoretical information into practical knowledge. The rigid structure of the curriculum makes it hard to grasp some key concepts of software engineering. At the end of my degree, I was still not sure if my code was clean - there was no need to reopen last semester’s assignments, my architecture was flexible to changes - no assignment lasted for more than a semester, or my software could be released in small, iterative circles - all assignments were fixed time-box. And it took years for me to get comfortable around these concepts. It was frustrating navigating in uncertainties. I wish someone was there telling me what I did wrong. In the language of the Dunning Kruger effect, my descent from the peak of Mt Stupid to the valley of Despair was rough.


A good mentor can ease the journey. Mentorship provides a sense of trusted guidance a personal friend can give with the bonus rule of being the manager. They help you to transit from a student to a professional.

Lastly, an internship implies getting out of the sandbox. At school, you are learning in isolation, a designed system, a less glamorous version of The Matrix. You spend your days solving imaginary problems and gaining imaginary successes. None of which reliably predicts your success at work, otherwise everyone would be boasting their GPAs now. And if you slip here and there, other than some awkward group assignments, there is hardly any difference in the grand scheme of things.

At work, however, there are consequences to your actions. There might be works whose input relies on your delivery. There might be deadlines that are not arbitrary choices dictated by a course’s curriculum but coordinative milestones to align other efforts. And there might be actual end-users of whatever you are delivering. An internship that can’t offer this sense of ownership is nothing but yet another simulation box.

Compared to the previous two elements, giving someone who has zero experience ownership is considerably more complicated. Many internship programs are decided by management or HR.

  • Management wants to establish a partnership with a university and an internship is a low effort investment.
  • Next year recruitment might get harder if none happens this year. Out of sign, out of mind.
  • Doing the dean's office a favor. True story.

While there is nothing wrong with these reasons, too often they are used as excuses to force the internship upon the team adopting you. A forced internship can go wrong in a couple of ways. 

(1) You can be treated as junior developers and go through the same onboarding experience. The thing is, internships are typically short, and by the end of it, you would come back to school to continue your studies, usually, right at the point the onboarding has just been over. The internship is then an overwhelming experience for you and a waste of time for the team.

(2) You might not be ready to work on production code yet so a common solution is to make up an internship project nobody needs. The general lack of coordination between those who want to host an internship and those who are in charge of it results in uncreative half-assed ideas, such as cloning a working piece of software, building an internal tool whose specifications were a few Slack messages, conducting an experiment that the mentor has not made any progress recently, etc. 

A better internship experience can be built with buy-in from the team. An internship fulfilling the needs of the team can be a constructive experience for everyone involved. It could be to give someone new to management a chance to run his own (intern) team before reaching out for a more substantial responsibility. It could be an isolated project with actual stakeholders who have good incentives to invest in its success. And it could also be an experiment, a research topic, but goals, direction, and companionship should come with it. In short, seek out to align the success of the team and the success of the internship.

Too many internship hours were wasted on demo code no one reads and pet projects no one uses. I hope yours would be better.

Saturday, November 6, 2021

Everything seemed normal

The air was warm, humid, and dense. I was panting after a badminton loss. I had never been good at this, or any sport for that matter. But I liked flexing the muscles after months of home confinement. Stepping out of the stadium, a rundown building surrounded by brick walls and metal sheets, the air was a lot more pleasant. The location of the court was nice, right next to a river. An adult was sleeping on a hammock under the shade hovering over the river bank. Half a dozen of kids were swimming in the grayish water. I couldn't help but notice, on the other side of the bank, a slump area where the river was, among other things, part of their toilet and sewage. Right at that moment, it was easy to forget Saigon was a cosmopolitan of ten million residents and the engine of Vietnam’s emerging economic prowess. Under that facade, Saigon was still pretty much a part of South Vietnam - a maze of rivers, canals, and delta farmlands - and the fate of the city and the land is anything but one.

Everything seemed normal except that it shouldn’t. We just struggled the whole ordeal of a four-month lockdown. Covid was still reigning across the country. Vaccine coverage was around 30%. In the rural area, agricultural produces was left to be spoiled on the field as it would cost more to ship elsewhere. Waves of laborers retreated to their hometowns without the slightest idea of what their future looked like. Meanwhile, in the city, everything cost more and the labor shortage was high. For the first time since the stats were made available in 2000 Vietnam recorded negative economic growth of 6.17%. Real estate, stock, crypto, and even gold markets were reaching their historical records, not because of the belief of a V-shape bounce back, but because of people hiding their assets for the coming inflation once the stimulus hit.

Everything seemed normal but the substance had changed. The life of the elderly was more fragile. Break-ups and divorces rose. Worst of all, young people, including kids, were robbed of 2 years in the most important period of their lives. Covid was unprecedented. It exposed many of us to our worst enemy, change. Some could brag how embracing change was part of their DNA, but let’s be real, few people turned their life upside down for the kick of it. Changes forced people to face uncertainty, make decisions without a mental model, and live with the consequences. Take an example: kids and schools. Among people my age, the number one reason for self-enforcing isolation at home was the worry that they could bring the virus home to the kids. The local government expressed the same concern, two months after lockdown measures were eased, kindergartens and schools had yet to open. For the entire of 2021, schools had open for around 4 months. While infection risk and its impact on kids were uncertain, it has already been a certainty that the lack of interaction with same-age peers damaged the well-being of kids. Which was more critical, a 0.1% chance of infection, or a 100% chance of development impact? Reports and studies were one thing, but when it came to our lives, the choice was personal, emotional, and far from statistical. Whatever the decision, the scars left on the current generation would be slow to fade.

A few days ago, Charlie Munger stated markets were even crazier than the dot-com bubble. He might be right. Statistically, he had been right more often than he was wrong. But the same statement could be found in 2006, and then in 2015. As the world got better connected, it had been inevitable that we dealt with bigger crises impacting more people. I believe that eventually covid and its crazy variants would be over, that we would treat covid without much discrimination from its influenza cousins, and everything would seem normal again. But make no mistake, the course of the world would never be on the same trajectory had covid not happened, and normalcy is anything but a wet dream.

In the river, the kids, tired of breaststrokes, had changed their game. The older kids were hopping on a floating traffic marker on the water for big jumps, betting on who could make the biggest splash. The younger ones speculating from the safety of their bright-orange life vests. A few years ago, there would have been neither markers nor life vests. Done mourning my loss, I stepped back to the court for the next game.




Sunday, September 26, 2021

The case of Project Manager


I have never used a clickbait thumbnail for my post before, not even when I nuked a production database. But admit it, the thumbnail got your interest this time. Let's see if I can deliver.

The role of a project manager (PM) is somewhat controversial in the software development community. Generalization is bad. There are bad PMs and there are bad engineers. But you don’t quite see the same question posed for product and engineer people, the other two pillars of software development. Google even tried to let go of all of its PMs and had all their engineers reporting to a single VP of Engineering. What makes PM different?

The opinions about PM are more subjected to bias than others. A lot of PM work happens behind the scene. When work is running smoothly and on schedule, every day is business as usual. At times, it feels thankless and unappreciated. It’s only when a project is plagued with issues that the PM starts getting attention - usually not in a good way. The strategic values of project management tend to take place before shit hits the fan. After that, primitive instincts kick in, engineers keep their heads down and work harder, the process is out of the window, and a PM seems to only stand in the way. A great PM could still shield the engineers, prioritize work so the worst fire is put out first, communicate the impact to stakeholders, and plan for the next step. But usually, a great PM wouldn’t let the shit hit the fan, to begin with. In other words, PMs are judged after things have gone FUBAR, and everyone can afford to be smart in hindsight. The same hindsight you probably had learned from your angry parents after those end-of-year conferences with school teachers. No mom, I wouldn’t have skipped school that day had I known there was a test. 

Every good PM succeeds in his own way, but bad PMs all fail the same: they didn’t do what they are supposed to do. What are they supposed to do? Sadly there is no universal truth. Granted, each company operates in its own way. At some places, SWEs deploy their own code, while in others, SREs reign supreme power over the production system. Some product owners are supposed to help with GUI design (hopefully based on some standard design system) or help with market research. But by and large, the outcome of members in a tech team is bounded to tangible deliveries and the inter-company differences are within standard deviation.

The PM role? They are supposed to be responsible for the success of the freakin’ project. Now that’s wild, how is that translated into concrete action is anyone’s speculation. That might mean being the link between engineers and customers, making sure both sides get what they want when they want. It could be about being a master of process, fluent in a range of management methodologies, and having an eye on constant improvement. Perhaps it involves understanding the web and mobile architecture, knowing modern technologies, and being quick on the uptake. Or I might have as well just described an Account Manager, Scrum Master, Technical Consultant, and part-time Avenger. PM’s scope of work is ambiguous by design as no two projects are alike and project management is the glue pulling things together, but unclear expectations are the breeding ground of disappointments.

Another thing that gives the PMs bad rap is the disdain for technology of some of them. Software development is not rocket science but it is not exactly a pure exercise of muscles either. That is to say, the field is somewhat technical. And just as any other technical field, it is full of useless jargon, lame inside jokes, and know-hows that take years to pick up. Non-technical PMs, people who experience difficulties explaining how a website works to a 5-year-old, manage to navigate around this quagmire by materializing all units of work into tickets. And they proceed to treat the tickets as little black boxes. The meaning of the work is watered down into start and end dates, and a set of labels for convenient herding. This makes a complicated system simpler. Some’s navigation skills are good enough that they don’t need to know how the finished work would look like. Work to them is parcels to carriers, you are supposed to ship it, not knowing the dead bodies inside. As I work for Parcel Perform, this sounds great!

There is a problem though, most people in software development are not in the business of writing code, they are in the business of building software products - hopefully, great ones. The difference is that one is an isolated piece of work with a predefined outcome, the other is a process of figuring out the intersection of market fit and technical prowess. Building non-trivial software requires plenty of arguments, negotiations, and decisions. Non-technical PM can’t call out bullshit. They lack the skills and tools to affect the outcome and all important decisions were handed to engineers and product owners. Without that decision-making capability, PMs turn into administrative assistants, busy themselves with monitoring project status, sending out updates, and keeping track of who does what. It is nice but it doesn't break or make a project.

Some PMs are shit umbrellas. Some are shit funnels. And the canopy of the said umbrella is technical knowledge. Not the same knowledge that is required for engineers to write code, but the knowledge to see through a project with clarity, know what is important and what is not, and call the shots when needed. It is not unusual for an engineer to pick up a book on Agile to better align his working habits with Scrum’s sprints.  The sight of a PM reading Google’s Site Reliability Engineering to come up with a better fire fighting routine is as common as the sight of a Saola. Isn’t that having the cake and eat it too?

Are PMs useless then? The bad ones are. Bad PMs pose negative net morale and productivity for their team. Don’t believe me? Try rewriting the software for the third time because your PM failed to strong-arm a customer to put his shit together. But the whole thing about Google not needing PMs is as much of an urban myth as it is truth. It happened in July 2001, in computer time that was a century ago. It wasn’t the mighty Google where every practice seems to be deliberated, there were around 130 engineers. The layoff didn’t stick. The engineers themselves opposed it. The whole thing lasted for less than a month and was pretty much Larry Page throwing a tantrum against Eric Schmidt’s adult supervision.

As long as software products are written by humans, the role of project management in coordinating a bunch of professionals toward the common goals is always needed. People who think project management is a good way to get into the tech scene without the background need to get their reality check. PMs are responsible for the success of a project even though they have little control over it - how exactly is that easy? It takes a lot to be a good engineer, and it takes even more to be a good PM. The more people on board with this thought, the better it is for everyone in software development. Don't let a bad apple spoils the bunch, and the best way is to work on being a good one.

Sunday, August 15, 2021

The engineer/manager spectrum

When I was younger, I was given many forms of the same advice: at some point, I would be better off to double down on either being an engineer or a manager if I wanted to progress in my career. I was booted from my first job while pretending to be good at everything, naturally, I thought this was a good idea.

Life looked at my choice with as much consideration as my parents when I told them I wanted to join the infantry in kindergarten - which was not at all. I joined a startup, and subsequently another startup. I never got to choose. Working at an early-stage startup is like conducting an old locomotive on a half-done railway. You juggle between keeping the engine running, because just like your code it breaks a lot, and building the railway with whatever you find along the way, you are selling a product before it is fully ready. The game is over when either stops.

Along that journey, I realize that career choices are not complicated, they are complex. Complication is when a system consists of multiple parts with intricate interactions. As overwhelming as the parts and interactions could be, they follow some universal laws that eventually make the system knowable. Complexity is when the parts and interactions are not fully knowable and the best we can do is some reasonable prediction. A car is complicated, the traffic is complex. I think people who gave me advice earlier understood this. But in an attempt to simplify a complex subject, half of the truth was lost.

I think when people talk about the engineer/manager choice, they probably mean that at the point you have to weigh your options as being an engineer or a manager, you have collected all the low-hanging fruits in your career. To proceed, you need a level of focus you haven’t experienced earlier. And as it is hard to focus on two hard things at the same time, you would be better off focusing on only one. Which is also true for my parents’ expectations of me.

While that is true, to perform at the peak level you cannot spread your effort all over the place, there is another half of the truth, that is the choice is neither binary nor permanent. You are probably overthinking if you view the management as a critical, life-changing decision. The move to management is not a promotion, it is a change in scope of work in which you drop existing responsibilities to pick up others. And if you hate the new responsibilities, you can always go back to the engineering track. Most companies have had sufficient understanding on this matter and will be glad to provide some rotation options for their employees.

The higher you advance in your career, the stronger the element of leadership exists in your scope of work. People in the technical track get promoted for working on hard and impactful problems. You wouldn’t get to work on the hard problems unless the easier ones were effectively delegated to someone else. The most impactful problems tend to be ones that once solved provide sustainable advantages for the business. Such problems tend to involve people of different background and function. Someone would need to rally them and align their interests to the shared goal. And yet delegation, communication, and goal alignment aren’t common practices among engineers. Although technical leadership isn’t exactly the same as people leadership, I think you will navigate better if you have an opportunity to learn about how people think, how larger projects are prioritized , and how organizations are run.

From my own experience, alternating between the two roles makes me better at both. In a hard project that challenges my technical prowess, I am more of an engineer. In a more relaxing project, my manager role is stronger so I can oversee things go to the right places. I am a better manager because I understand the morale friction caused by a poorly planned project, and I am a better engineer because I know the red flags of a poor project and when to fire alarms.

The final caveat is that engineer and manager roles exist in a spectrum and it curbs your professional and personal development by forcefully align your options along with the extremes. Not only do most people find themselves somewhere in the middle, but they also move back and forth as their careers progress. People who can turn a technical advantage into a business advantage usually have a blend of engineer/manager in them. You can advance without spending time in management. But if you want to give management a try, you should. Do not let your identity be simply defined by a title. 


Saturday, May 1, 2021

Lessons from a promotion

My tech team is organized into squads - cross-functional teams owning end-to-end feature development. The squad leadership is a joint collaboration of a product owner, a project manager, and a tech lead. As the business expands, more squads are needed and it falls to me to fulfill these new tech lead vacancies. To facilitate professional growth, internal promotion is favored over external hires. Along the way, I learned a few lessons.

It starts with a job description

The one thing that makes or breaks the promotion is the job description. A vertical promotion from junior to senior involves performing harder and grander tasks with tighter deadlines, in other words, being more proficient a what you have already been doing. A promotion to a leadership position is more “horizontal” in that sense. Pretty much like the time you left high-school, I don’t think any amount of prior experience can truly prepare one for what comes next.

Our tech leads work with people across all principles to provide a cohesive technical vision for the squad, contribute to the product strategy, and coach their members. In that role, many activities are new to them. They will be working with people whose functions they haven’t fully comprehended, like a tech lead with FE background working with DevOps for a deployment plan. They are asked for estimations while given far less details than what they received pre-promotion. They are exposed to HR matters around the well-being of their crew, not all of which make everyone happy. And just sometimes they have the trauma of having their handcrafted solution taken out of context for an entirely different thing and 3 days to deliver. Given the drastic change in scope of work (and the PTSD), it is understandable that post-promotion, some feel like a fish out of water. Unfortunately if there is a structural approach to eliminate this sense of disorientation, I haven’t found it yet.

While it is tempting to propose a five-page long job description listing out all little details one is supposed to perform and hence solve the challenge once for all, the managerial wet dream is nothing more than a motivational debt. Software development rewards people for their creative prowess and that in turn attracts great problem solvers to the craft. Practically spelling out what one needs to do is the opposite of that. The job description should enable the person to picture the boundaries of her authority and the impact she has on the team without resorting to dictating the specific activities. Everyone will have different responses to “make tactical moves to ensure successful deliveries”, or “look after the career development of team members”, and that’s part of the growth. Take that, Tiger Mom!

Strength in diversity

In the previous year, the squad model had some initial successes. The first two squads jelled and performed well, relatively uneventfully. Structurally. both were the mirroring image of each other: BE-heavy, big data focus, led by old-timers. So when it came to the next new squad, there was a strong urge to copy the earlier success: same leadership profile, same structure, and same kind of work. That should be easy, the management knows what to do, the promoted people have existing role models to follow, and things probably fall into the right place like they had done before. That was as close to a squad printer as I could think of.

In reality, my third squad was FE-heavy, had a strong interest in UI/UX topics, and had a product owner stationed away from the main body of the team. It couldn’t be any more different from the former two. I am glad that this happened.

A parthenogenetic offspring of a squad would have been an easy choice down a slippery slope.

I didn’t realize at the time, but collectively the technical discussion had already leaned towards the server side of thing more than it should. It is normal that individually each of us turns our face towards what we know and against what we don’t. But it gets dangerous when we all turn towards the same thing, we get ignorant of our faults and prejudices. In OOP, that is known as closed for modification, and closed for extension too.

The identical leadership profile would also send the wrong kind of signal, that one has be X and work in Y to get promoted. Everyone with a different profile probably feels unappreciated like a 40-year-old on Snapchat and take their chance elsewhere.

With the birth of the third squad, I got to learn the importance of a design system, the vast untapped advancement of browser technologies, and the bias in BE-FE collaboration. All these are areas of improvement that wouldn’t have surfaced if we had gone down the easy path and promoted yet another BE engineer. It itches me to sound like a social justice warrior, but we did find new powers in diversity.

The support structure

No, the new squad was not released to the wild to fend for itself. That would have been bogus.

In fact, the support structure was the one area received the most attention back in the squad formation. We defined the 3-prong structure where product, technology, and agenda support each other. The right people were hand picked for the backbone and the remaining vacancies received the highest recruitment priority. Meeting plan was laid out so everyone had multiple outlets to discuss their opinions. 

The support structure was least of my concern, till something hit me in the face, something technical yet also... sociological.

The new squad got its people and work split from the two existing ones, like a cell division. Hence found itself co-contributing a number of code repos with the others. That led to some confusions where its realm of existence started and ended. The team operated with a constant fear of stepping on someone else’s foot. The organizational structure was changed and had not been reflected in the software interface. Wham! It was such a classic case of the Conway’s law that I was awed to observe it first hand, yet hurt for not seeing it coming earlier. The law was one of my favorite engineering observations, right up there with Murphy’s.

The following rectification was relatively straightforward. We educated people about the boundaries of squads, brought in service contract to strengthen the interface between them, and proceeded to splitting shared services into smaller ones where it made sense.

A personalized journey

Accompanying the tech leads on their way through aforementioned obstacles was a rewarding experience but easy it was not. There are many questions yet few definite answers, how many tech debts are too many, when an internal tool should be made. Much variation in preference, some are more than happy to deal with abstraction where others are keen on a transparent view. And much uncertainty ahead for that no plan can account, how one accounts for spending a month waiting for an engineer to onboard just to have the guy quit a day before his start.

It is probably apparent now that I haven’t done enough of this to actually know what I am doing. But I am experienced enough in software development to deal with uncertainties. And I am invested in getting it work for my team.

The Agile Manifesto has it that

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

They are good guidelines to drive our weekly catch-ups. We keep a very experimental approach at what we are doing, and maintain a close feedback loop. What work are replicated elsewhere, what don’t are studied. But most importantly, I always try to be a thought partner through out this journey. Interactively growing a team where people are collaborative and open about their problems today is more important than having it down to a science with a rigid plan tomorrow.

If one day I have a toddler, I ain’t no need for books.

Onwards.


Sunday, December 13, 2020

2020 - A year unlike any others

The post is dedicated to Parcel Perform’s engineering journey to profit through the pandemic - also known affectionately as the longest Tet ever by the locals. Other teams fought hard too and deserved their own stories. Opinions are my own and do not express that of my employer.

Patient zero

It was a day in March that Covid stopped being a passing disease in some corners of the world like SARS, Zika, or Ebola and started to be a reality to me. Vietnam got its first case in January. Sequoia’s letter to its founders and CEOs was making its way around the Internet. And WHO declared Covid a pandemic. 

I am not going to lie, despite the unease of entering a pandemic period, part of me was looking forward to it, fondly. I was ten when the dot-com boom took place. Vietnam was some remote corner of the world, known only for the war. The Internet was a foreign word. Mom brought home rice, canned food, and fish sauce in a basket to prepare for Y2K. Everything I know about the dot-com bubble, its magic and devastation, and how Internet giants emerged from its crumble was told to me. More than just an economic downturn, it was imprinted on me that not just surviving, but striving through a hard time is the ultimate test for me as an engineer, and the system I have built. All were thoughts and stories until that day in March. The old farts could move aside with their bubble. My generation had Covid.

The management acted swiftly and decisively. Changes were introduced to protect individuals and the company's sustainability. The flexible WFH policy had always been there. Teams were split to come to the office on alternative days. Meetings were moved online. Monthly townhalls were broadcast online with details of a break-even plan by end of the year. With everyone in lockdown, e-commerce activities - our main source of revenue - raised significantly. Vietnam also took aggressive measures to minimize the spread of the virus. The new changes in life both inside and outside of work were new and exciting. For a while, Covid had seemed like a challenge we could face with a sense of hope.

It was then the grim reality set in. As more customers worked online, our platform became the only way for them to stay on top of their logistics situation. The eyes were on us. The demand for service availability, something previously had not been as desirable as it could be, skyrocketed. In parallel, the virus spread at a speed that made the Mongolian hordes look like amateurs. People started to cite stories about the infamous Spanish Flu. Recovery estimation was changed from months to years. The virus was not the only thing in the air, fear also was. The investment market contracted, dried up, and imploded on its former optimistic self. No one wanted to bet on the uncertainty of what eventually became the biggest threat to humanity in this century (so far). We had to make changes to keep the remaining runway as long as they could be and planned for an unattractive funding round that we did not even know could happen.

By April, we found ourselves fighting bigger challenges, with a smaller and less effective workforce. Basically a sadistic role-play of the US health care system.

Is less more?

In software development, there is a sense of elitism of doing more with less (people). Whatsapp was sold with 55 employees, serving billions of messages every single day. Instagram with 13 employees, including the two founders. Markus Frind built Plenty of Fish solo for 6 years. Imagine what else these people could have done if they had had Asian parents! 

That wasn't exactly our story though. 

The break-even plan was broken down into monthly targets. We scored the first month, then got into a dry spell effectively the whole Q2. The B2B sales cycle was notoriously long, we had strong leads that took months to realize. The general anxiety about the future did not make anything better. The off-track plan placed a hiring freeze on us. Meanwhile, the unprecedented quarantine surge traffic and sales effort sent more work our way. We typically won customers by going the extra mile to make custom features and integrations. The work was not hard, but irritatingly time-consuming as most normal customers were not fluent in the programming if-this-then-what riddle.

  • The time on custom work kept us from working on the core system.
  • The under-invested core could not handle the surge in traffic, so here and there engineers got pulled out to help fire-fighting.
  • The looming committed deadlines were peril dishes for easy implementation whose architectural simplicity was an afterthought. The code became brittle and less welcomed for an extension, making custom work and fire fighting even more time-consuming.

It was a vicious circle that could very quickly deplete the team morale and leave everyone burnt out.

We sook to find the balance between feature work, and core system. We designated 90% of the development time to be split between making new features, adding customizations, and paying tech-debt, and the remaining 10% on whatever the team thought would be a future issue if left unchecked. That went exactly as good as Donald Trump’s plan for the pandemic: utter chaos stems from reality detachment.

The reality was that we had more on our plate than we could chew. We struggled to keep up with the delivery schedule before, we would not be able to with 10% less. And though the time invested on tech-debt would help us in the long run, investment took time, the time we could not afford. In the deadline frenzy, the 10% budget was a forbidden fruit, and development time was given to whatever made the biggest noise at the time. Sometimes it was the core system because nothing spoke louder than a system outage. But for the rest of the time, it was a customer-first policy. We needed that revenue stream to pull through the hard time. We were the homeless of software development trying to make a saving account.

Less is more does not come solely from the engineering side of things, it only thrives where it aligns with the whole business as a cohesive unit. Whatsapp, Instagram, and Plenty of Fish are consumer mobile apps that demand very little customization compared to the world of B2B that we are in. SAP has thousands of developers - not Techcrunch headline material - yet denying so is to deny the laws of physics.

There were some interesting leads that we kept hearing about, but otherwise, Q2 ended on an eerily uneventful note. Little did we know, this was the night before the storm.

Growing pains

Then came Q3 in a way we could not have expected. The traffic that has not slowed down in previous quarters then gained even more momentum. Malls in the US and EU were shutting down and likely remained so over the holiday season due to concerns about spreading the virus (it indeed happened). People turned their compulsive buying online. Good times if your business is centered around tracking and analyzing e-commerce shipments. There was only one little convenience. The data stream flooded through us with all the force of the mighty Mekong before people built all those dams over her.

Having invested in a horizontally-scale application layer, our journey to scalability was a walk in the park. Except for trees, the park had carnivorous ents, pedestrian Nazguls, and birds fell beasts. The squirrels were cool, they probably just had rabies. The walkthrough Mordor taught me about growing pains more than all my teenage years combined.

For months we wrestled with performance issues, always shoved into our hands at the most inconvenient moment. The embodiment of Murphy’s law, solidified by postmortems piled higher and deeper, stem from a series of adequacy:
  • Lack of imagination. As much as the growth was welcomed, we did not successfully foresee the full impact of such growth on the system.
  • Lack of experience and expertise. When incidents happened, we did not know what to do, not immediately, and not fluently executed. Various types of database lock, Kafka data corruption, and Flink zombie jobs all happened for the first time to all of us.
  • Lack of infrastructure investment. The list of tech-debt was ever-growing, and the development of internal tooling came too little and too late. 
There has always been a tug of war between building a solid system with imaginative traffic vs a house of cards with desirable features waiting to collapse on its own weight. Both compete on finite resources that are time and effort. A seasoned entrepreneur would say the latter is preferable as it indicates we have found a problem worth solving and people are willing to pay for the pain killer. But such knowledge offered little comfort as you were staring at the screen at two in the morning, with the weight of the entire system on the chest, feeling the cold sweeps in the limbs subduing other sensory leaving only numbness, getting angry with yourself and everyone and everything.

We had three system meltdowns for the three months of Q3. Each came in a magnitude that threatened the existence of Parcel Perform and made me question the decisions of my life. And when you do that three times in a row, you start to question your sanity too. But there has always been light at the end of a tunnel, no matter how long. With sheer efforts and a lot of hours staring at the screen and self-doubt - because hey we were lack of everything else - we pulled through. I wrote this piece to remind myself the fight is only over when I give up.

Performance issues suck really hard. Going through them is painful. And I wish them to never be a part of my life. But the growth which comes with the effort to overcome each and every performance challenge is undeniable. Every time we solve a problem, the system gets stronger and stands higher. We acquire knowledge we would not fathom before. Our process is taking steps to mature so we can fight together effectively and reserve the hard-earned knowledge, though we are a long way from the finish. The insurmountable mountains are less intimidating. Much of what we consider valuable in our world arises out of these kinds of challenges because the act of facing overwhelming odds produces greatness and beauty. Such is growth and pain at their worst and best combinations. 

We ended Q3, bleeding from the mouth, but confident to start hiring again.

The aftermath

2020 was a bizarre experience. The world was forced to accept a new reality for better or worse. We were forced to mature. Decisions in the back of our minds that we knew we would get there somehow were put in the first-row seat as the pandemic arrived.
  • We transformed the functional engineering team into cross-functional squads just days before the country entered lockdown.
  • We now have a CS team that collectively covers 24 hours of a day and a stronger presence in the EU while the common situation of the industry is to contract inwards.
  • Our SLA went from best-attempt to actual tangible values. A move that increased the excitement of the sales team ten folds, the exact amount it decreased from my engineering team. A striking example of conservation of happiness.
  • We ended up overshooting the break-even plan 2x.
But on the other hand, there is no denying that wherever we look we see improvement needed. We are trying to keep a system that has grown 4x since the beginning of the year stable. New components like pg_bouncer and setups like splitting the Flink cluster into single jobs were irreplaceably crucial. Yet the biggest pain points have shared the same pattern: the application logic we implemented a year ago can no longer handle the traffic we have today. We haven’t fully escaped the hideous Sisyphus circle. Patches of various degrees of permanence were introduced; still usually before we could fully complete the implementation and internalize the knowledge, another thing would happen. The little tech debts we eagerly took are coming back with compound interests.

And we are doing all these jugglings in the aftermath of a 6-month hiring freeze. We had had the same set of people working together from the beginning till the end of the pandemic (Vietnam calendar, we have been lucky). We worked intimately with all the services and developed an acute sense of troubleshooting. While that experience was pivotal in maintaining the system, our investment in the onboarding process, documentation, and internal tooling was insidiously slacked off. A person unfamiliar with the system would find what we have been working on in the last 6 months overwhelming. The participation of the new engineers post-hiring freeze was a waking call.

An exploding system and a high overhead of adding new members were a stressful combination. At times, we were probably just steps away from the death of a thousand cuts. Despite all that, there is a mutual feeling that the worst Covid has to offer was in the past and thing only gets better from here on. We have a strong team of young people who every day fight above their weight class. The debts we have are solvable with the right amount of time. We have a product that fits the market and has heaps of space to develop. When you manage to grow so much despite the turbulent time of 2020, fewer things can frighten you. We are looking at 2021 with much anticipation.