Saturday, April 13, 2013

Software Estimation - Techniques and process

Tl;dr

In software estimation, counting and computing are more reliable than judgement alone. Try to start your estimation by counting something closely related to the project size, available early, consistent between projects and statistically meaningful. To make sure as few miscounts would be done as possible, check the work against some sort of Work Breakdown Structure. Best Case and Worst Case make a good start for judgement, but there is a hidden problem of adding them up. The Mostly Like Case and Expected Case formular were introduced to help the situation. Estimation can also be made by comparing to similar projects in the past.
Not everything in a software project can be counted (easily), a family of techniques known as proxy-based estimation helps overcome this challenge. The techniques rely massively on the ability to process raw historical data of an organization.
Software estimation can also take advantage of wisdom of the crowds by creating the right environment to support this idea.

Count, Compute, Judge

People have the tendency to mistake estimation with judgement (or, in another word, guess). However, researchers have found that judgement alone is the most inaccurate form of estimation. Estimation experts are more likely to perform better than newbies with judgement alone, but in fact, that result comes from a wide range of historical data, experience and painful stories the experts have been through in their career. Was the estimation made in an inexperienced field, no different in performance was observed.

Under the light of statistical science, counting and computing are proven to be more reliable. You should always count related things first, then compute what you can't count and finalize the estimation with calibration data. Only use judgement as the last resort.

There are many things you can count in a software project. Per the cone of uncertainty, the later in the project, the finer the level of granularity you can count at.
In order to avoid being paralyzed by choices, there are a few rules of thumb we can use to decide what to count.
  • As size is the strongest influence in a software project, count something that closely reflects the project size
  • Thing that can be count early in the project is better than something we need to wait till later
  • To get benefits from historical data (like pro estimators), we better count something consistent between projects
  • Count something that woud result in a relatively large number (like 20 or more) so that we can take advantage of the law of large numbers. The law states that the erros on the high side and errors on the low side cancel each other out to some degree.

Decomposition by Work Breakdown Structure

In the last post, we have already listed omitted activities as one of the major sources of estimation error. Omitted activities consists of forgotten features and forgotten tasks. Forgotten features can be reduced by thorough requirement engineering and experience. In general, this requires much practice and retrospection to improve. Forgotten tasks, on the other hand, can be improved dramatically by checking our work against an activity-based work breakdown structure.


Justify Probability Statement

Even though, judgement is very subjective, we cannot avoid that. After all the counting and computing, judgement is needed for the actual numbers. Educated guess is better than blind guess, and here is how can we make one. We have learned that estimate is a probability statement, we should stop using single-point number as reliable estimate. Best Case and Worst Case make a good start, it is more likely to catch actual hours somewhere in the middle and makes us more comfortable.

But there is a problem with adding up best and worst cases. Lets say that each of the individual Best Case estimates is 25% likely, meaning that you have only a 25% chance of doing as well or better than the estimate. The odds of delivering any individual task according to a Best Case estimate are not great: only 1 in 4 (25%). But the odds of delivering all the tasks are vanishingly small. To deliver both the first task and the second task on time, you have to beat 1 in 4 odds for the first task and 1 in 4 odds for the second task. Statistically, those odds are multiplied together, so the odds of completing both tasks on time is only 1 in 16. To complete all 10 tasks on time you have to multiply the 1/4s 10 times, which gives you odds of only about 1 in 1,000,000, or 0.000095%. The odds of 1 in 4 might not seem so bad at the individual task level, but the combined odds kill software schedules. The statistics of combining a set of Worst Case estimates work similarly. (McConnell, 2006)

Due to that, we introduce Most Likely Case with the hope that the sum will be closer to the actual. Still, developers' "most likely" estimates tend to be optimistic. A technique called the Program Evaluation and Review Technique allows us to calculate the expected case. The formula is derived from statistical studies.
Expected Case = [Best Case + (4 x Most Likely Case) + Worst Case] / 6
Or if the organization has a history of consistent optimism
Expected Case = [Best Case + (3 x Most Likely Case) + (2 x Worst Case)] / 6

Estimating by Analogy

The basic idea is to create new estimates by comparing the new project to a similar project in the past. Again, old rule applies, count first, then compute and use judgement

Break similar previous project into pieces using requirements and WBS Count
Compare the size of new project and the old one piece by piece Judge
Build up the estimate for the new project's size as a percentage of the old project's size Compute
Create an effort estimate based on the size of the new project compared to the size of the previous project Compute
Calibrate the result Judge

There are areas where analogy doesn't work, like business rule. But still
"One contrast between the estimate created using analogy + decomposition, and un-decomposed approach is that in the later uncertainty in one area can spread to other areas." (McConnell, 2006)

Proxy-based Estimate

Not all activities in software development process result in code, nor everything can be counted, for instance how many test cases a feature needs, how many defects should be expected, how many pages of user documentation would be written. A family of estimation techniques known as proxy-based techniques helps to overcome these challenges. Find another metric that is correlated to what we want to estimate ultimately. Once the proxy is found, we estimate or count the number of proxy items and then use a calculation based on historical data to convert from the proxy count to the estimate we really want. The basic idea behind this kind of technique is that developers cannot estimate exactly accurately, but can estimate relatively accurately pretty well. Which means it is hard to tell if a task takes 4 or 6 hours, but relatively easy to state a task is two times harder than another. By making relative comparison to the past, we tell the future.

Where can we find the proxy? There are three main sources: industry average data, organization historical data and project specific data, in the order of increasing accuracy.
  • The data of different organizations within the same industry differentiate variously, by a factor of 10. And if we use the average productivity for our industry, we won't be accounting for the possibility that our organization might be at the top end of the productivity range or at the bottom. (McConnell, 2006)
  • The majority of projects in an organization are often similar in size and also developed under similar organizational influences. The estimates hence will not be subject to much error.
  • Project specific data is useful in the same way historical data is. Further more using data from the project itself will account for the influences that are unique to that specific project. The sooner on a project we can begin basing our estimates on data from the project itself, the sooner our estimates will become truly accurate.
A few popular approaches in this family are Story Points, Fuzzy Logic, T-shirt Sizing and Standard Components. Due to the lack of processing historical data, we do not use proxy-based estimate very often. Lets take a quick look anyway.

Story Points is very well-known across Scrum teams. It is used to measure the relative effort for a story, in numeric units. Story Points only starts to become useful after the first few iteration, when the team can count the number of story points it delivered and compute its velocity. You can easily find articles about Story Points on the Internet.

Fuzzy Logic is exactly similar to Story Points but instead of using numeric measurements, people classify size as Very Small, Small, Medium, Large, and Very Large. The favorite argument point when it comes to "Fuzz Logic vs Story Points" is that the use of numeric scale implies that you can perform numeric operations on the numbers, multiplication, addition, subtraction, and so on. But if that relationship isn't valid, a 12-point story doesn't require 4 times the effort a 3-point story needs, then the number 12 isn't any more valid than the Large and Very Large sizes.

T-shirt Sizing is a derivation of Fuzzy Logic where business value is brought to the table. Sales and marketing staff will say, "How can I know whether I want that feature if I don't know how much it costs?" And a good estimator will say, "I can't tell you what it will cost until we've done more detailed requirements work." It would appear that the two groups are at an impasse. By representing both business value and development cost, nontechnical stakeholders can make decision based on net business value. (Numeric values are for illustration purpose)

Development Cost
Business Value Extra Large Large Medium Small
Extra Large 0 4 6 7
Large -4 0 2 3
Medium -6 -2 0 1
Small -7 -3 -1 0

Standard Components is the most straight forward one. If we have developed a number of program that are architecturally similar to each other and possess a certain amount of historical data, we can estimate the number of standard components we have in the new program, and compute the size of the new program based on past sizes.

When proxy-based estimate is not effective.

When using proxy-based estimate, it's important to remember the Law of Large Numbers, that the rolled-up number has a validity that the underlying numbers do not have. If you don't have a big enough number of items to estimate, the statistics of this approach won't work properly, and you should look for another method.

Collect historical data

The weather today won't always be the same as it was yesterday, but it's more likely to be like yesterday's weather than like anything else (Beck and Fowler 2001)
The most important reason to use historical data from our own organization is that it improves estimation accuracy. Historical data takes into account organizational influences. Estimating these influences one by one is difficult and error-prone. Historical data adjusts for all these influence even though identifying the specifics is hard. The data also helps us avoid subjectivity and unfounded optimism. There is an effect known as The Second Project Effect where a lot of assumptions are made from what you learned from the last project, "We know the business logic better this time", "There was a lot of turnover last time, we won't have it this time (?!)" or "We will do a better job at requirement management". With historical data, we use a simple assumption that the next project will go about the same as the last project did. Because in fact productivity is an organizational attribute that cannot easily be varied from project to project (Putnam and Myers 1992).

Group Review

This is actually not a technique at all, but rather a set of rules of thumb to conduct an estimation review in group. The goal of doing the review in group is to obtain the wisdom of the crowd, so before looking into the set of rules, lets talk about this effect.

Wisdom of the crowd is the belief that the aggregation of information in groups, resulting in decisions that are often better than could have been made by any single member of the group. Not all crowds (groups) are wise. Consider, for example, mobs or crazed investors in a stock market bubble. These key criteria separate wise crowds from irrational ones:

Criteria Description
Diversity in opinion Each person should have private information even if it's just an eccentric interpretation of the known facts.
Independence People's opinions aren't determined by the opinions of those around them.
Decentralization People are able to specialize and draw on local knowledge.
Aggregation Some mechanism exists for turning private judgments into a collective decision.
http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds

When the decision making environment is not set up to accept the crowd, is that the benefits of individual judgments and private information are lost and that the crowd can only do as well as its smartest member, rather than perform better

Extreme Description
Homogeneity The need for diversity within a crowd to ensure enough variance in approach, thought process, and private information is stressed
Centralization The hierarchical management bureaucracy limits the advantage of the wisdom of low-level engineers.
Division Information held by one subdivision was not accessible by another.
Imitation Where choices are visible and made in sequence, an "information cascade"[5] can form in which only the first few decision makers gain.
Emotionality Emotional factors, such as a feeling of belonging, can lead to peer pressure, herd instinct, and in extreme cases collective hysteria.
http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds

Based on that study, Steve McConnell suggested this set of rules in group review.
  • Have each team member estimate pieces of the project individually, and then meet to compare your estimates Discuss differences in the estimates enough to understand the sources of the differences. Work until you reach consensus on high and low ends of estimation ranges.
  • Don't just average your estimates and accept that You can compute the average, but you need to discuss the differences among individual results. Do not just take the calculated average automatically. Convergence among the estimates tells you that you probably have a good estimate. Spread tells you that there are probably factors you have overlooked and need to understand better.
  • Arrive at a consensus estimate that the whole group accepts If you reach an impasse, you can't vote. You must discuss differences and obtain agreement from all group members.



Beck, Kent, and Martin Fowler, 2001. Planning Extreme Programming, Boston, MA: Addison-Wesley.
Putnam, Lawrence H., and Ware Myers, 1992. Measures for Excellence: Reliable Software On Time, Within Budget, Englewood Cliffs, NJ: Yourdon Press.
Steve McConnell. (2006). Calibration and historical data. In: Software Estimation - Demystifying the black art. Redmond: Microsoft Press.
Steve McConnell. (2006). Estimation by Analogy. In: Software Estimation - Demystifying the black art. Redmond: Microsoft Press.
Steve McConnell. (2006). Individual expert judgement. In: Software Estimation - Demystifying the black art. Redmond: Microsoft Press.

Monday, April 1, 2013

Interview at Vietnam's most successful internet company


There are tons of rumor about top tech corporations in Vietnam, but how is it actually like in the nutshell? I got tired of rumors already so I decided to take a look into that world by myself, I applied to the most successful Internet company in Vietnam.



The recruitment information on the Internet in general was very smooth. From the website I was able to learn about open vacants, benefits and perks, and the hiring process. However the sheer size of a 2000-people corporation shows signs of departments stepping on each other foot. The company favors project teams over feature teams. Each team is then responsible for its own vacants. This ends up in numerous job descriptions with the same title "Senior Software Developer" and different human-incomprehensible ID such as 12-WBM-1369 or 13-WTE-1504. Perhaps for insiders, these two codes are as different as e-commerce and social game development, but what I perceived was a confusion (and I didn't fully understand there were different IDs until after the interview).

The application required a CV and a cover letter, which are pretty standard. Mine were written 3 years ago and had never been actually used before (I got my first job via, uhm, word of mouth network). I quickly revised them and submitted the application around midnight. Late afternoon the next day (Thursday), I have got my interview scheduled and confirmed for 9AM next Monday. My phone was accidentally out of battery and I really appreciated that the HR girl patiently tried to reach me 4 times before I could pick up. Though it wasn't a job hooping season, I was still pleasantly surprised.



The night before the interview, I got unconsciously excited. Given that I had interviewed close to a hundred of applicants at the point of time, the excitement was hard to understand. In fact, I got too jumpy that barely could I sleep and that upset my stomach the next morning. 

Couldn't enjoy my breakfast much I came to the company 15 minutes early. When I arrived, the motorbike park and elevator were both crowded. I guess these people don't start a day at 10 like I do. I managed to find a place in the elevator. Standing in a box with other strangers, don't know what to say and what to do with your body was really awkward. I never enjoy sharing the elevator with strangers and I would have took the stairs if the appointment hadn't been on the 13th floor. As the elevator went up and people got in and out, I could see the company offices occupying not one but several floors in this building.

The half of the 13th floor that I was in seemed to be a big meeting area, it was packed with multiple glass-wall rooms named after major rivers over the world. Mekong was the first and Yangtze the last. The company logo and slogan printed on transparent decal were on every walls. I proceeded to meet the receptionist and grabbed a chair next to a few coffee tables in the hall, waiting for my interviewers to come. A big monitor was showing latest K-Pop hits meanwhile. Brochures and posters were scatted every where around the hall. They look really professional. Yet it reminded me of Valve's employee handbook. Of all organizational artifacts, an employee manual served as such a compelling form of global PR for the shift from an industrial biz model to a knowledge management/humanistic model. Brilliant awesomeness was still hiding.

When I was about to finish the third brochure, one of my interviewer, Ms. Tuyen, showed up and took me to a meeting room. It was a little room at the end of an lobby running across the hall. Walking down the dark lobby, I asked whether I would be interviewed in English. "Vietnamese", she replied. "So why the email was in English?". Many Vietnamese companies, most of them, practice this half-baked communication style, English for writing and Vietnamese for the rest. People ended up with some sort of Vietlish that I am allerged to ("Khang oi, can you help me", "Regards em nhe", etc.) Tuyen redirected my question elegantly, but I could tell that Vietnamese was the only language here. I wasn't surprised. In fact it reminded me of Summer, my former employee who couldn't blend into Cogini English-speaking culture.




There were places for 6 in the room. The first impression was the noise of the AC attached to the outer wall of the room. I don't think the noise was that bad, but when I am worried, every external signal seems to be amplified ten fold. The AC was no white noise generator and somehow I started to like the waiting hall better. There was instruction to use VoIP for conference in the room, but I didn't find any devices. The room has a good view over the main street and flower boxes on the pavement but I wasn't left alone in the room to explore the view. The interview happened right afterward. 

Tuyen was an HR staff, she needed someone else to test my technical skills. So she started the interview by talking about the potential project that I would join if hired. "Not launch yet", "Similar to what is happening out there". I couldn't help but think about an e-commerce system, which this corp hasn't succeeded yet, despite of its number of attempts every year. 

Before she finished her last sentence, two men entered the room. The older looked quite casual, actually his outfit looked a bit slipshod and his hair obviously needed some touch. The other guy had tan skin and looked quite sporty. He was very quiet during the interview, in fact, he didn't ask me a single question.


I was asked to give a short talk about myself. Ha! Just like Cogini! And yet I managed t deliver a below average introduction. Knowing the question and listening to countless answers don't make your answer better. The interviewers showed concerns when I expressed my interest in getting a masters degree and being a lecturer in the next couple of year. For a moment I saw my reflection from the other side of the table. 


We moved on through some technical questions, from data structure to database and web server engines, and scaling techniques. The questions had nothing to do with my CV. I didn't mention these skills in my CV and technically database was the only part I know thoroughly and put into my CV (rule of thumb, only put what you really know into the CV). My technical interviewer clearly was asking his concerns, not checking the skills I possessed. I couldn't help but wonder, how can these people detect a candidate that doesn't fit for the position he is applying for (due to the confusing job descriptions) but a true gem for another team right within the company.




The questions were randomly selected, I think, because he didn't have any note with him, just paged through my CV. That gave me an impression that he didn't read my CV before hand. Despite of his randomness, all questions and explanations focused on only one single thing: scalability. For every question, he wanted to know if I knew the implementation and algorithm beneath. Having its success root in online game distribution, the company has a vast number of loyal users. So focusing on scalability make a perfect sense to me. Though, my interviewer had the tendency to go a bit too extreme, I believe he knew what he was talking about. Anyway, not being a big fan of revert engineering, I must have passed 3 questions that related to things behind the curtain. However, after all the interview that I did, I developed a thick skin and defended myself through his questions quite well.


As prospect products and vision are important assets here, we weren't allowed to talk much about those. We went on to have some discussion about software development process and daily activity. The point of view of my interviewer was that processes (he didn't state which) are helpful for outsourcing companies as they indicate what are the steps and what need to be done in each; for in-house projects, processes served little value and yet seemed to create too much bureaucracy overhead (?!). He then went on explaining why multitasking is normal. Their work come from multiple sources, on-going projects and support for live products. The code base also goes through constant rework as "This is the Internet, thing changes fast. The right user experience is unknown and experiments are needed", he said. As we continued to talk about technical work, it appeared to me that his team biggest achievement was to be capable of implementing their own version of world-class libraries and frameworks such as SQLite or jQuery. Couldn't restrain myself, I asked for his opinion about the open source community, about Q&A platform like StackOverflow and Quora. The situation sounded just like the movie "300" to me, just that the Spartan were no better than Persian.

The last couple of minutes at the end of the interview was some casual chit chat about working environment and job description. As far as I could understand, engineers are only given really technical work. My seek for a position with the balance of management and technical work was blocked by the bureaucracy of the 2000-people organization. There weren't many new facts in this last minute talk, but enough for me to confirm that the most successful Internal company in Vietnam has a firm hold on its human resources. Though my time to work for a corporation hasn't come yet, I wish it live long and prosper.