Sunday, December 13, 2020

2020 - A year unlike any others

The post is dedicated to Parcel Perform’s engineering journey to profit through the pandemic - also known affectionately as the longest Tet ever by the locals. Other teams fought hard too and deserved their own stories. Opinions are my own and do not express that of my employer.

Patient zero

It was a day in March that Covid stopped being a passing disease in some corners of the world like SARS, Zika, or Ebola and started to be a reality to me. Vietnam got its first case in January. Sequoia’s letter to its founders and CEOs was making its way around the Internet. And WHO declared Covid a pandemic. 

I am not going to lie, despite the unease of entering a pandemic period, part of me was looking forward to it, fondly. I was ten when the dot-com boom took place. Vietnam was some remote corner of the world, known only for the war. The Internet was a foreign word. Mom brought home rice, canned food, and fish sauce in a basket to prepare for Y2K. Everything I know about the dot-com bubble, its magic and devastation, and how Internet giants emerged from its crumble was told to me. More than just an economic downturn, it was imprinted on me that not just surviving, but striving through a hard time is the ultimate test for me as an engineer, and the system I have built. All were thoughts and stories until that day in March. The old farts could move aside with their bubble. My generation had Covid.

The management acted swiftly and decisively. Changes were introduced to protect individuals and the company's sustainability. The flexible WFH policy had always been there. Teams were split to come to the office on alternative days. Meetings were moved online. Monthly townhalls were broadcast online with details of a break-even plan by end of the year. With everyone in lockdown, e-commerce activities - our main source of revenue - raised significantly. Vietnam also took aggressive measures to minimize the spread of the virus. The new changes in life both inside and outside of work were new and exciting. For a while, Covid had seemed like a challenge we could face with a sense of hope.

It was then the grim reality set in. As more customers worked online, our platform became the only way for them to stay on top of their logistics situation. The eyes were on us. The demand for service availability, something previously had not been as desirable as it could be, skyrocketed. In parallel, the virus spread at a speed that made the Mongolian hordes look like amateurs. People started to cite stories about the infamous Spanish Flu. Recovery estimation was changed from months to years. The virus was not the only thing in the air, fear also was. The investment market contracted, dried up, and imploded on its former optimistic self. No one wanted to bet on the uncertainty of what eventually became the biggest threat to humanity in this century (so far). We had to make changes to keep the remaining runway as long as they could be and planned for an unattractive funding round that we did not even know could happen.

By April, we found ourselves fighting bigger challenges, with a smaller and less effective workforce. Basically a sadistic role-play of the US health care system.

Is less more?

In software development, there is a sense of elitism of doing more with less (people). Whatsapp was sold with 55 employees, serving billions of messages every single day. Instagram with 13 employees, including the two founders. Markus Frind built Plenty of Fish solo for 6 years. Imagine what else these people could have done if they had had Asian parents! 

That wasn't exactly our story though. 

The break-even plan was broken down into monthly targets. We scored the first month, then got into a dry spell effectively the whole Q2. The B2B sales cycle was notoriously long, we had strong leads that took months to realize. The general anxiety about the future did not make anything better. The off-track plan placed a hiring freeze on us. Meanwhile, the unprecedented quarantine surge traffic and sales effort sent more work our way. We typically won customers by going the extra mile to make custom features and integrations. The work was not hard, but irritatingly time-consuming as most normal customers were not fluent in the programming if-this-then-what riddle.

  • The time on custom work kept us from working on the core system.
  • The under-invested core could not handle the surge in traffic, so here and there engineers got pulled out to help fire-fighting.
  • The looming committed deadlines were peril dishes for easy implementation whose architectural simplicity was an afterthought. The code became brittle and less welcomed for an extension, making custom work and fire fighting even more time-consuming.

It was a vicious circle that could very quickly deplete the team morale and leave everyone burnt out.

We sook to find the balance between feature work, and core system. We designated 90% of the development time to be split between making new features, adding customizations, and paying tech-debt, and the remaining 10% on whatever the team thought would be a future issue if left unchecked. That went exactly as good as Donald Trump’s plan for the pandemic: utter chaos stems from reality detachment.

The reality was that we had more on our plate than we could chew. We struggled to keep up with the delivery schedule before, we would not be able to with 10% less. And though the time invested on tech-debt would help us in the long run, investment took time, the time we could not afford. In the deadline frenzy, the 10% budget was a forbidden fruit, and development time was given to whatever made the biggest noise at the time. Sometimes it was the core system because nothing spoke louder than a system outage. But for the rest of the time, it was a customer-first policy. We needed that revenue stream to pull through the hard time. We were the homeless of software development trying to make a saving account.

Less is more does not come solely from the engineering side of things, it only thrives where it aligns with the whole business as a cohesive unit. Whatsapp, Instagram, and Plenty of Fish are consumer mobile apps that demand very little customization compared to the world of B2B that we are in. SAP has thousands of developers - not Techcrunch headline material - yet denying so is to deny the laws of physics.

There were some interesting leads that we kept hearing about, but otherwise, Q2 ended on an eerily uneventful note. Little did we know, this was the night before the storm.

Growing pains

Then came Q3 in a way we could not have expected. The traffic that has not slowed down in previous quarters then gained even more momentum. Malls in the US and EU were shutting down and likely remained so over the holiday season due to concerns about spreading the virus (it indeed happened). People turned their compulsive buying online. Good times if your business is centered around tracking and analyzing e-commerce shipments. There was only one little convenience. The data stream flooded through us with all the force of the mighty Mekong before people built all those dams over her.

Having invested in a horizontally-scale application layer, our journey to scalability was a walk in the park. Except for trees, the park had carnivorous ents, pedestrian Nazguls, and birds fell beasts. The squirrels were cool, they probably just had rabies. The walkthrough Mordor taught me about growing pains more than all my teenage years combined.

For months we wrestled with performance issues, always shoved into our hands at the most inconvenient moment. The embodiment of Murphy’s law, solidified by postmortems piled higher and deeper, stem from a series of adequacy:
  • Lack of imagination. As much as the growth was welcomed, we did not successfully foresee the full impact of such growth on the system.
  • Lack of experience and expertise. When incidents happened, we did not know what to do, not immediately, and not fluently executed. Various types of database lock, Kafka data corruption, and Flink zombie jobs all happened for the first time to all of us.
  • Lack of infrastructure investment. The list of tech-debt was ever-growing, and the development of internal tooling came too little and too late. 
There has always been a tug of war between building a solid system with imaginative traffic vs a house of cards with desirable features waiting to collapse on its own weight. Both compete on finite resources that are time and effort. A seasoned entrepreneur would say the latter is preferable as it indicates we have found a problem worth solving and people are willing to pay for the pain killer. But such knowledge offered little comfort as you were staring at the screen at two in the morning, with the weight of the entire system on the chest, feeling the cold sweeps in the limbs subduing other sensory leaving only numbness, getting angry with yourself and everyone and everything.

We had three system meltdowns for the three months of Q3. Each came in a magnitude that threatened the existence of Parcel Perform and made me question the decisions of my life. And when you do that three times in a row, you start to question your sanity too. But there has always been light at the end of a tunnel, no matter how long. With sheer efforts and a lot of hours staring at the screen and self-doubt - because hey we were lack of everything else - we pulled through. I wrote this piece to remind myself the fight is only over when I give up.

Performance issues suck really hard. Going through them is painful. And I wish them to never be a part of my life. But the growth which comes with the effort to overcome each and every performance challenge is undeniable. Every time we solve a problem, the system gets stronger and stands higher. We acquire knowledge we would not fathom before. Our process is taking steps to mature so we can fight together effectively and reserve the hard-earned knowledge, though we are a long way from the finish. The insurmountable mountains are less intimidating. Much of what we consider valuable in our world arises out of these kinds of challenges because the act of facing overwhelming odds produces greatness and beauty. Such is growth and pain at their worst and best combinations. 

We ended Q3, bleeding from the mouth, but confident to start hiring again.

The aftermath

2020 was a bizarre experience. The world was forced to accept a new reality for better or worse. We were forced to mature. Decisions in the back of our minds that we knew we would get there somehow were put in the first-row seat as the pandemic arrived.
  • We transformed the functional engineering team into cross-functional squads just days before the country entered lockdown.
  • We now have a CS team that collectively covers 24 hours of a day and a stronger presence in the EU while the common situation of the industry is to contract inwards.
  • Our SLA went from best-attempt to actual tangible values. A move that increased the excitement of the sales team ten folds, the exact amount it decreased from my engineering team. A striking example of conservation of happiness.
  • We ended up overshooting the break-even plan 2x.
But on the other hand, there is no denying that wherever we look we see improvement needed. We are trying to keep a system that has grown 4x since the beginning of the year stable. New components like pg_bouncer and setups like splitting the Flink cluster into single jobs were irreplaceably crucial. Yet the biggest pain points have shared the same pattern: the application logic we implemented a year ago can no longer handle the traffic we have today. We haven’t fully escaped the hideous Sisyphus circle. Patches of various degrees of permanence were introduced; still usually before we could fully complete the implementation and internalize the knowledge, another thing would happen. The little tech debts we eagerly took are coming back with compound interests.

And we are doing all these jugglings in the aftermath of a 6-month hiring freeze. We had had the same set of people working together from the beginning till the end of the pandemic (Vietnam calendar, we have been lucky). We worked intimately with all the services and developed an acute sense of troubleshooting. While that experience was pivotal in maintaining the system, our investment in the onboarding process, documentation, and internal tooling was insidiously slacked off. A person unfamiliar with the system would find what we have been working on in the last 6 months overwhelming. The participation of the new engineers post-hiring freeze was a waking call.

An exploding system and a high overhead of adding new members were a stressful combination. At times, we were probably just steps away from the death of a thousand cuts. Despite all that, there is a mutual feeling that the worst Covid has to offer was in the past and thing only gets better from here on. We have a strong team of young people who every day fight above their weight class. The debts we have are solvable with the right amount of time. We have a product that fits the market and has heaps of space to develop. When you manage to grow so much despite the turbulent time of 2020, fewer things can frighten you. We are looking at 2021 with much anticipation.



Saturday, August 15, 2020

Đừng bỏ cuộc

Gần hai giờ sáng, phía ngoài văn phòng, một cặp vẫn đang tâm sự. Dù ánh đèn 7-11 hắt ra, mắt vẫn díu lại, chẳng nhìn được rõ mặt. Mấy ngày trước, vài khách hàng lớn bắt đầu sử dụng dịch vụ, lượng thông tin tăng mạnh. Hệ thống như nhà cấp 4 chặn đường bão cấp 8, dột vô số chỗ. Hy vọng đây là đêm cuối. Mọi người đã giải quyết được nhiều vấn đề. Giấc ngủ sẽ trở lại sau những ngày thấp thỏm trông con mọn.

Tôi làm việc ở một startup đã vài năm. Làm việc ở đây cảm giác như lái tàu hoả từ thời Liên Xô trên đường ray chưa tồn tại. Tàu vừa chạy vừa xây đường, bằng bất cứ gì có được xung quanh. Xây lúc nhanh lúc chậm. Đường lúc lên lúc xuống. Nhưng quan trọng nhất là tàu vẫn phải chạy.

Thiếu người, bắt đầu sau, và ít tiền, nhưng vẫn làm tốt hơn những công ty cạnh tranh, không có cách khẳng định "tôi giỏi" nào đơn giản mà mạnh mẽ hơn vậy. Những ngày đó, bạn đi trên mây và thế giới là của riêng mình bạn. Và cũng có những ngày như hôm nay, người gồng, đầu cúi, mắt nhìn không qua được mặt bàn. Công việc là một chuỗi dài những sai lầm ngu xuẩn.

Lúc nhỏ, ba mẹ hay nói lớn lên sẽ làm được cái này cái kia. Đầu lớp một đạp được xe. Lên cấp ba làm được trại 26/3. Vào đại học tự lập. Như thể bên trong có những cái công tắc màu nhiệm, đủ tuổi thì công tắt bật, sẽ hiểu được những hệ thống bự đùng, thấu sự đời, và đạt niết bàn. Theo đúng thứ tự như vậy.

Có điều, sau hai startups thất bại, vẫn chưa cái công tắc nào được bật. Chỉ có công việc là khó hơn. Nhiều khi sợ hãi, như người bơi xa sợ đuối nước, chỉ muốn quay đầu, mọi áp lực này sẽ biến mất. Không còn những cuộc gọi lúc nửa đêm. Không còn những đêm dài một mình trước màn hình, nghe dưới da nhịp tim tăng dần. Không còn vò đầu bứt tóc, bất lực trước những câu hỏi tại sao. Nhưng làm startup nhiều hạn chế. Không có lưới bảo hiểm. Giờ mà buông bỏ, khó quá không làm, thì sau lưng cũng không còn ai làm cả.

Từ dòng code đầu tiên, chật vật mới xử lý hết 20k requests trên con máy ảo bé tí, đến giờ mỗi ngày vài "Tê" đi ra đi vào, hệ thống và mọi người xung quanh nó đã dậy thì biết bao nhiêu lần, có cả chết đi sống lại, đều là nhờ không bỏ cuộc mà tìm được lối ra.

Không có một bí kíp luôn đúng cho các vấn đề của một hệ thống phức tạp. Quan trọng là kiên nhẫn và đừng quá khó khăn với bản thân. Nhìn được chuỗi sai lầm ngu xuẩn là đi được bước đầu tiên rồi. Giải quyết một vấn đề, tàu chạy được một ngày. Giải quyết một vấn đề nữa, chạy thêm một ngày. Rồi vấn đề thứ ba, thứ tư, thứ năm. Đến cuối cùng của chuỗi ngu xuẩn, là đến đích rồi. Hoặc là thế, hoặc là thất bại và có được một bài blog ngon lành trên con đường chống dốt. An toàn hạnh phúc với những dự án bé bé xinh xinh, rồi sao chịu được sóng to gió lớn?

Có lẽ, đó là cái công tắc cuối cùng, đã được bật từ lâu.


Sleep is for the weak.

I am weak.


P/s: Sau khi lên nháp ý tưởng bài blog này, hệ thống của tôi bị sập mất Kafka - lần đầu sau gần 5 năm. Tốn thêm bốn tiếng căng thẳng mới giải quyết được vấn đề. Một minh chứng về việc một hệ thống IT chỉ tồn tại giữa những lần bị sập, và không có lần sập cuối cùng.

Saturday, April 25, 2020

Building a postmortem culture

post·mor·tem

\ ˌpōs(t)-ˈmȯr-təm \ 
noun
1. Autopsy
    A postmortem showed that the man had been poisoned.
2. An analysis or discussion of an event after it is over
    The blameful postmortem culture shuts down the exploration of the problem because no one wants to be seen as stupid, even if it's ignoring the clear truth.

A postmortem is a written record of an incident, its impact, the actions taken to mitigate or resolve it, the root cause(s), and the follow-up actions to prevent the incident from recurring. The postmortem concept is well known in the technology industry.

I picked up the concept of postmortem from my previous job at Silicon Straits Saigon. The idea that we could study an incident was there, but the guidelines and culture enforcement was weak. So though I was sold that postmortem was a powerful practice and with proper enforcements made a system become more robust over time, I didn't exactly know how to start a culture around it. The most concrete guidelines I received was from Site Reliability Engineering. Wherever the Google practices seemed too extreme or impractical in my context, there was the Internet. The knowledge was powerful and enlightening, and I appreciated the journey in the last 6 months to transform it into operational HOWTOs.

From the very beginning, I was aware that a postmortem culture needed to be a joined effort of the entire organization for it to be effective. And I was never interested in being a secretary. But like many other initiatives that involve other people, you can't just make an announcement and expect things to happen, magically. I tried. A few times. So in the beginning it was just me recording the incidents that I was a part of either the solution or the problem. Most of the time both. And that gave me the time and experience I needed to make calibrations to the plan before it was presented to everyone.

Work to take blame out of the process

Blame, both the act of blaming and the fear of being blamed is the enemy of a productive postmortem culture. If a culture of finger-pointing and shaming individuals or teams for doing the "wrong" thing prevails, people will not bring issues to light for fear of punishment, or stop investigations prematurely as soon as a "culprit" is identified. Such halts the development of preventive methods for the same situation in the future. The force to blame is formidable, we as human beings are wired for it. Dr. Brené Brown, in a TED talk, explained blame existence as "a way to discharge pain and discomfort". The fact that whenever you want to trace back whose code caused your miserable wake up at two in the morning for a system outage, the command says `git blame` certainly doesn't help.

This is where being blameless gets popular in postmortem literature. And if there is anything subjective to an objective piece of work that is a postmortem, this is it. I find being completely blameless hard to implement. On the one hand, a postmortem is simply not a place to vent frustration. On the other hand, at times, it feels like tip-toeing around people, so worried about triggering their fragile souls that you miss out on chances to call out where and how services can be improved. This is where I come to an agreement with J. Paul Reed that it is important to acknowledge the human tendency to blame, allow a productive form of its expression, and yet constantly refocus to go beyond it.

Here are some examples. The examples might or might not involve me in might or might not actual scenario.

Blameful:
Someone pushed bad code to production via emergency pipeline. The tests in CICD could have caught this, but someone thought he knew better. Seriously, if you aren't sure what you are doing, you shouldn't act so recklessly. Rolling back in the middle of the night is a waste of time.
Action items:
  • Think before you edit someone's code.

Completely blameless:
Last night, an unauthorized code was pushed to production. CICD was skipped because CICD takes 30min and it was fire fighting situation. The fix was not compatible with a recent refactor. 
Action items:
  • Improve CICD speed

Blame-aware:
Last night, an unauthorized code was pushed to production. CICD was skipped because CICD takes 30min and it was fire fighting situation. The fix was not compatible with a recent refactor. 
Action items:
  • Improve CICD speed
  • Infrequent contributors should use the safety net of CICD
  • Issues in a service need escalating to maintainer of the repo for code review
  • Rollback mechanism needs to be available to developers on pager duty.
I felt like in the examples above, without accepting that pushing code to an unfamiliar service in the middle of the night was a reckless action, we would miss the chance to put in preventive measures. But again it is subjective, perhaps my blame-aware version fits perfectly into a blameless version of another. I hope you get the point.

Work on some guidelines

It is useful to be as specific as possible about when a postmortem is expected, who should write it, what should be written, and what the goals of the record are. Not only it provides a level of consistency across your organization, but it also prevents the task of writing a postmortem to be seen as a whimsical assignment from some higher-level authority and someone is being picked on as a punishment for doing the "wrong" thing.

Some of my personal notes of the matter:
  • Different teams might have different sets of postmortem triggers. The more critical your function is, the more detailed the triggers should be.
  • People who caused the incident might or might not be the ones to write the postmortem. The choice should be based on the level of contribution the person has to offer, both in terms of context and knowledge, not because of his previous actions.
  • Be patient. The people you are working with are professionals in software development, but the ability to write good software does not transmit into the ability to write a good document. Quality of root cause analysis prevails eloquence. Save the latter for your blog.

Work on the impact to your audience

in the beginning, the incidents I was working on were about a database migration to Aurora (and a hasty fallback), so I was assuming my audience would be my fellow developers. Possibly extend to project managers, project managers love knowing why you are stealing time from their team. And a reasonable consequence was to write the postmortem in markdown and store them in the same code repo with the affected services. There were a few issues with that.

Firstly, in a technology startup, the scope of tech choice is always bigger than "just the tech team". In my case, Customer Success people need to know what the impacts on our customers were and are, Product people want to know if the choice comes with new possibilities, and Sale people want to sell those possibilities. As much as I love such integration between the developers and the rest of the company, the idea of granting universal access to code repos to view markdown postmortems terrifies my boss, and therefore subsequently me, obviously.

Secondly, as familiar as markdown is, it is not a very productive option if you want to include media in it. And we want charts of various system metrics during the time of the incident to be included in the postmortem.

Lastly, writing a postmortem is gradually becoming a collaboration effort, and git repo, though supports collaboration, does not do it in real-time.

Considering all the options, we finally settled with a shared Google Drive in the company account. It is neither techie nor fancy. But it allows very flexible accessibility, tracks versions, natively supports embedded media, and lets multiple people collaborate in real-time. We share our postmortems in a company-wide channel, and sometimes hold an additional presentation for particularly interesting ones.

Let it grow

When you have done your homework, built a foundation of trust and safety, laid out the guidelines and constantly improved it, and integrated the postmortems with your larger audience, it is probably time to take a step back and let the culture take a spin on its own. My company's postmortem culture won't be the same as Google's no matter how many Google books do I read. And as long as it works for us, it doesn't matter.

With some gentle nudges, my colleagues are picking up postmortem on their own. We have seen contributions from Product Owners and Project Managers, besides the traditional developers and DevOps contributors. The findings are anticipated by a large audience across the company. 

And in the latest incident, which involved the degradation of performance in a few key features of our SaaS offering over the course of a week, we identified another usage of a postmortem: a postmortem updated regularly with the latest incident reports, findings, and potential impacts in a near real-time fashion is a powerful communication tool across the organization, both ensure the flow of information to people who need it (CS to answer questions, PM to change project plans for urgent hotfixes, etc) and allow developers to focus on their critical work without frequent interruptions.

As we grow and our system gets more sophisticated, hopefully, the constructive postmortem culture would turn out to be a solid building block.

Thursday, January 16, 2020

Run, Forrest, Run!

Tl;dr: I ran my first marathon, and whined about it. Move on.


4 years after finishing my first half marathon, I finally did my first full marathon, 42k of sweat and pain. 2019 was horrible for me, through all ups and downs, the marathon plan is one of a few that keep me together. The cut off time was 7 hours. I wanted to do a sub-5 (complete the run under 5 hours) but ended up with a sub-6. I was squarely in the bottom quarter of my age group. So it wasn't all glory and stuff, but I am so glad I did it.

I must have started the training back in March or something, and didn't follow the training plan through and through, obviously. I got sick, which paused the plan by a week every time it happened. I got injuries that eventually put me out of action for a whole month. And when I was back, following the original training plan just gave me too much stress and guilt, which I certainly didn't need - my life was really low, so I forwent it and just ran whatever the fuck I wanted. That was probably 2 months ago.

The injuries were actually a blessing in disguise. They forced me to rethink my running form. I picked up a book on running (that is not Murakami's autobiography) and tried to avoid "common sense" misconceptions, like most notably, landing on your whole foot. I finished 42k without any injuries. Yay!

The day I got the bib, it came with a shock. I was put into the 30-39 age group. Technically, it is not my birthday yet. And despite all the talks, I was not mentally prepared for this. Ouch! Oh and I also got interviewed.

I had never run the full distance prior to the run and in retrospect, wasn't a great idea. I now believe that the body would prepare for an extra few km on top of the maximum distance you have covered but not by a long shot. And it makes sense, why would my body be ready for 100k if I have never run 50k? The longest I had done was 30k and that explained why from 34k I got cramps so bad.

It was also the first run that I got proper sleep the night before. And I wasn't hungry. I sure stuffed myself with loads of carb, so full that on the night before the run, I thought it was stupid, I couldn't possibly run with such a stomach. but above all, shout out to the organizers, the route had more than sufficient water, electrolyte, and banana.

One last thing, the Nike app has improved a lot between then and now. It is no longer off by 30% and comes with cooler features. Well done Nike.

---

If that hasn't bored you out of your skull yet, you might want to see how my run broke down. "How did you remember all of this?" - I knew I gonna write one of these post-event, so it wasn't that hard. And I made up all the bit I didn't remember, including that I ran at all. Bwahaha.



Starting line: That's right, 42k is the first wave, the first-class citizen of a marathon. With all the volunteers standing around and looking, the limelight feels good. Wait, hang on. It's already 4. Why aren't we starting? Technical issue? Great, I am trying to get some work-life balance and here I am, with bugs.

0km: 10 minutes in, here we go guys!!! Let me just start my run on the Nike mobile app. Fuck fuck fuck. I dropped an energy bar while shoveling the phone back to the running belt. Screw it, I am not fighting against a wave of runners for a stupid bar. What a start.

1km: An old man with a Vietnam flag on his back is making a crude joke that a bunch of fit men, leaving their horny wives and young children home, to run on the street at 4 in the morning must all have mental issues. It could have been a good joke, it could have. But why did you have to be so fucking disgusting in your choice of words old man? Urgg why are you even carrying our flag?

2km: Some already making pit stops at the trees by the sides of the road. Shit looking at them gives me the urge too. Nah. If I sweat enough, the exceeding liquid will just be repurposed in time. Probably. The 42k 4:45 pacers are here, but they seem slow (1) and have loud music on. Better keep some distance.

3km: Here it is, the first major water station. Thanks to the Starting Line Incident, I am down to 4 bars now. I should have more bananas. Double portion, please! There are Waldo, Doraemon, and Ao Dai right in front of me. Cute, but I am not falling behind casual cosplayers. Onwards!

5km: We are joined by a group of 21k, they seem to have a shorter route. I no longer hear the music of the 4:45 pacers. I also don't want to have my pace mess up by 21k runners. Time to speed up a bit.

7km: Just gulped down the first energy bar. Entering the beast - Phu My bridge. Still have a vivid memory of how it wore me out in my first 21k. Some 21k runners keep passing me. Well, at least they aren't 42k.

9km: The easier quarter of the bridge was easy. Neat, there is a water station before the hardest quarter. Go in for a shower. Feel so good. Kimochi!!!

10km: Wow that's the highest point of the ascending half already? That was quick. I'm feeling great. The training works!

12km: "Coming through!" I didn't yell but it was certainly loud as I ran past a few runners. I'm sprinting! Not supposed to put stress on my feet no? But I am on a runner's high. Gotta take advantage of this slope then.

14km: Keeping up a good speed. Ketchup guy wait for me! Well, he is a 42k runner in costume which, for the lack of visual detail, only makes me think of a bottle of ketchup. I might be running too fast. There is no downhill gravity to play with. Slowing down.

15km: Crossroad. Am I turning or keeping straight? Oh, there is a volunteer, neat. I asked you twice for the direction and the best you can fathom is "Huh?". You, sir, are truly an idiot.

16km: "That fires we don't put out, will bigger burn". And that's exactly why I am standing here right next to a tree, minding my own business. Here comes the same water station on the 3rd KM. Banana!  I am joined by a bunch of 21ks. This group is with pacers of 2:20. Guess I'm not doing too badly myself (2). But they are loud. I am putting in some distance.

18km: This is proceeding nicely. I'm bored. Time for some music. After all, what is the point of having the pinnacle of technology in my belt? And lost a bloody energy bar.

20km: I am rejoined by the 2:20 pacers. This time the topic is on the color of the underwear the pacers are wearing. I should now add that the pacers in this group are women. "You're wearing nothing!" Someone screamed top of his lungs. Look like he is having a really good time. No, he isn't carrying a Vietnam flag. I looked. I'd love to add some distance again, but I am getting slower.

21km: Canada International School eh? Funny. I'd be here again later this afternoon to watch a game of Saigon Heat. This is a massive waste of energy. Doraemon is behind me. I'm not running behind a cosplayer, not a blue fat cat with comically short legs (and balls for hands). Just a bit faster. Entering the differentiator turn, this is the part of the route that 21ks don't join. This stretch of the route seems to last forever (3).

25km: The sun is already high. I can't possibly head to a tree this time, can I? (4) Embracing myself for a stinky toilet. Wow, it's actually clean. This is awesome. The toilet, not me pissing.

27km: Doraemon is behind me again, but I can't possibly run any faster than this. I tried. Ran ahead of him for tens of meters and I would fall back to a normal pace and he would pass. Not just Doraemon though. I am losing count of how many have passed me.

28km: Good morning milady, can you help me with some of that muscle spray please, on both legs? Wow, that was refreshing! Thank you very much.

29km: God damn, some of that spray got onto my crotch. My balls are freezing. I sure hope they don't fall off.

30km: Got tension on the thighs. I got this. I got this. I trained for this. The app announces I have 12km left. Took me a while to calculate that I have run for 30km. Math is super hard.

32km: I have never run this far in one go. From here on, it's uncharted territory. Squats, I need to do a few squats, it stretches my thighs a bit so they are functional again.

33km: The tensions have turned into cramps. Squat. Run for a couple of hundred meters. Cramp. Squat. Rinse and repeat. It hurts so much.

34km: Arrggg I fought, but I can't run anymore. My thighs got cramps. My ankles hurt from all this stomping. And the soles of my feet too, for pretty much the same reason. Worst of all, my brain seems to go blank, this is stupid, what I am even doing. I have to walk now.

35km: I have run a few short dashes, a couple of hundred meters each. One of the attempts locked my legs, almost landed my face on asphalt. The cramps are still going strong. Someone just handed me a big chunk of ice. It freezes my hands. Dude, what I am supposed to do with this? My balls are gone and that's bad enough. Here, tree, your daily ration in a solid form.

36km: Here is the plan, I gonna run between one crossroad to the next, then walk till the next crossroad. I still get cramps whenever the 2 crossroads a bit farther apart, but at least my mind has come back. The sunlight is roasting me. I miss you, sunshine.

41km: Fuck no! My legs gave up on me. Completely. I get cramps just from walking. No amount of squat seems to help. I can hear the crowd from here, so fucking close.

42km: My legs are at the stage where any excessive movement would give me cramps. The last 200 meters, the finish line is finally here. Here goes nothing. The legs don't seem to be mine, I move them like two sticks. I run. I cross the line. I get a high five. A girl put this medal around my neck. Heck, I can' even recall what she looks like. But she was wearing a Bà Ba, that was a nice touch. Under normal conditions, I would have appreciated the outfit, but right now I am having a strong urge to vomit my guts out.

---
(1) This is probably the first sign that I didn't manage my energy level well. Too cocky. But again, I aimed for sub-5, so...
(2) I conveniently forgot the fact that 42k started 30 min in advance. But we also had a longer route since the beginning. All else being equal, I was running the first half between 2:10-2:20.
(3) It didn't. It lasted for 3.5km. Running on a familiar route made me feel like it was shorter.
(4) I talked to a friend about this. Pro-tip is to just pee on yourself. In a race, you probably consume enough water that your pee is transparent anyway. My shorts were white, so it didn't help much with the level of confidence. Best to do this at a water station where they usually put a big bucket of water for a quick shower.