Great startup stories tend to share the same mold: a great founder dreamed of an equally great vision and followed it fearlessly till the world was conquered. History is written by the victors, of course, but it is relatively common for people to have a rough idea of what they want to build before starting the work. That’s the easy part.
But constructing a concrete step-by-step plan to deliver not even the vision but a mere release is hard work. A good plan needs to take advantage of both business and development expertise without letting one overpowers the other. If the business makes all the calls, the development time might be painfully long and the product crashes when traffic starts to peak. If the development decides, we might have a technical wet dream of solving a non-existent problem. That’s where the planning game comes in.
I can’t decide if a game or a dance is a better metaphor for depicting the collaborative nature of planning. Business and development, each possesses knowledge unavailable to the other and is unable to produce the entire plan. The work can only be done by combining the strength of both sides. In economics, a “game” refers to a situation where players take their own actions but the payoff depends on the actions of all players. Game Theory suddenly sounds less conspiratorially adventurous, doesn’t it? Dance on the other hand isn’t used as much in research literature so I ended up siding with the economists. That’s a sidetrack.
In an Agile team, a planning game looks like this:
- The Product Owner decides the scope of the plan. Based on the purposes of the projects, the Product Owner prepares a set of use cases and explains why they are valuable problems to solve and why they should be done first.
- The whole team breaks each use case down into stories. The idea is usually that anything requiring the team to do something other than normal company overhead needs a story.
- The developers “size” the stories. They estimate the time each would take or its complexity. And then group stories that are too small, split ones too big, and decide what to do with stories they can’t estimate.
- The Product Owner prioritizes the stories. Some stories won’t be worth adding, either unimportant or too far in the future.
Our notifications is crucial at informing customers the health of their business. To make the data pipeline behave transactionally, we have several options. Please let me know how should we prioritize them.
- Experiment with Flink’s TwoPhaseCommit, this is new to us so it would take time and be hard to estimate.
- Get Sentry to cover all the projects, this is a passive measure as we passively wait for exceptions.
- Add a check at the end of the pipeline to make sure no duplicated notifications are generated, the check will have to handle its own state.
- Move the final stage of the pipeline to Django, it is a web framework that supports transactional requests by default and we are familiar with it.
Our notifications is crucial at informing customers the health of their business. The data pipeline is long and consists of multiple nodes, each needs to successfully finish its work to produce a notification. To achieve this notion of exactly-once delivery, we need the pipeline to behave transactionally and every exception to surface swiftly. That is done via Flink’s TwoPhaseCommit and Sentry integration. The work will be done at the beginning of the project as it is easier to handle when the code base is still small. TwoPhaseCommit in particular is new to us so we will have a couple of spike stories to understand the technology.
When there is a business choice to be made, don’t ask Product Owners to choose between technical options. Instead, interpret the technology and describe the options in terms of business impact. To continue our notification example, before any notification is sent, there is a need to make sure the data we have is the latest. The conversation can go like this:
We are thinking about adding another Kafka queue to request the latest data. We then need to join the request flow with the future trigger with some sort of sliding window, will also need to thinking about out of bound data. Our other option is to set not one but two future triggers so that one can request data and the other handles notifications. Which would you prefer?
Try this instead:
We have two choices for ensuring a notification always works on the latest data. We can use a deterministic approach or an empirical approach. The deterministic approach would add a new data request flow right before the notification is sent. The notification is processed after the data request flow so we always sure the latest data is used. But because technically data procession and future notifications are asynchronously independent from each other, it would require several more stories for us to join them together. The empirical approach won’t take any extra work. We observe that it usually takes less than 5 minutes for a data request, so we can set two future triggers instead of one, 10 minutes apart from each other. The first one request data, the second notification. But the margin of error is larger because sometime there can be delay in data request. Which would you prefer?