Wednesday, August 26, 2015

How is testing (probably) done at Facebook ?

As a Social networking Platform, Facebook meets and sometimes exceeds the characteristics of a Platform such as achieving scale, morphing of product (adopt different shape at will like Transformers example), and extends new features (via exposed APIs), users, customers, embrace third-party collaboration.
Since architecturally, Facebook has reached at certain level of Platform maturity- it is interesting to view it as a case study in Software testing because the approaches they have used to test have eventually assisted to build Quality into the platform and have made the engineering teams productive.  One thing to note is that there is less information publically available on how Facebook does testing so the below data is more my inference from going through relevant contents (stated below). This data may or may not be true as it hasn't been validated officially by someone from Facebook, but still, do enjoy reading.
The data in the table below is organized from the below two sources-  (represented in the black font in the Comments section of the table) (represented in the orange font in the Comments section of the table)

CategoryFacebook's approachComments (These are the direct references from the quoted articles)
Independent Testing departmentNoFacebook's approach to Org design is different. Even though they are so heavily focused on Mobile development, they dont have a separate Mobile department. Below comment-
We don’t have a “mobile department” since we found that hard to scale appropriately. Instead, teams working on features, such as photos or the News Feed, own that feature on every platform we support, from the mobile web, “traditional” desktop browsers, through to mobile platforms such as iOS and Android.
Approach to Software releasesAgile, RapidThere are a number of different models for how to do software releases, no matter whether it’s on the web or an app to mobile, but they all play with three factors: time, features, and quality. Naturally, quality should always be pegged to “as excellent as possible,” so that leaves a choice between choosing to release when a suite of features is ready, or doing time-based releases.
The feature-based releases seem appealing on the surface, but prove problematic to deliver consistently. After all, when was the last time you saw every software project at a company meet its planned ship date with everything working as expected? So releases get held up as some features are finished before others, and sometimes features need to be bumped as priorities change.
All that means that we do time-based releases. Our release cycle has the app ready to ship every four weeks, though it might take longer than that to get into people’s hands because we also need to get into the app stores.
Anatomy of Facebook's testingLayered approach to testing,
Majorly automated
The improvements we’re making now may be less obvious, but we have automated tests which track things like power consumption, memory and CPU usage, and how we use bandwidth, the goal being to improve (or, at least, hold steady) all of those metrics with each release.

The key to this speed is automation. There’s just no time to do a full manual run through of every feature before a release. A traditional QA department, following scripts to verify that everything worked as it should, would dwarf our development team. Instead, we’ve placed layers of automated tests to ensure that regressions are as infrequent as possible.

In order to enable fast release cycles, feedback loops need to be as tight as possible. There’s no space in this for QA to be kept at arm’s length or until the end of the process (which is madness: “Quality” isn’t something you can add as an afterthought).

Another facet of our testing matrix is site behavior testing. Michael Stockton and other engineers have put a lot of effort into making it possible to asynchronously test the site as the user's use it. We use WebDriver ( to run site behavior tests like being able to post a status update or like a post. These tests help us make sure that changes that affect "glue code" (see, which is pretty hard to unit test, don't cause major issues on the site.

Engineers can also use a metrics gathering framework that measures the performance impact of their changes prior to committing their changes to the code base. This framework (which is crazy bad ass btw) allows an engineer to understand what effects their changes have in terms of request latency, memcache time, processor time, render time, etc.

We're still tuning the testing process in order to maximize engineer efficiency and minimize the time spent waiting for tests to run. Overall, the priorties are speed of testing, criticality (yes it's not a word meh meh meh) of what we test, and integrating testing into every place where test results might be affected or might guide decision making.
Maintenance of AutomationDisabling not-needed testsOne of the things that Facebook does is to only promote automated tests into their regular test runs once they’ve demonstrated stability. We’re ruthless about disabling flaky tests, and equally ruthless about deleting disabled tests.
Focus on Regression specific automationBig timeUltimately, the automated tests are taking more and more of the strain out of development, because regressions are being caught sooner, and therefore being fixed faster, sometimes before the code has been committed.
Test Automation ROI philosophyCost vs GainsIn the film Fight Club, there’s a scene where one of the characters explains how the auto industry choses whether or not to recall a vehicle. It’s an equation that something like, “the cost of recall” needs to be less than the “cost of a payment if something goes wrong” multiplied by “likelihood of a payout being needed.” Automated tests are much like that: the cost of writing and maintaining them (however you measure “cost”) needs to be lower than the cost of not writing them.
Approach to Manual testingMainly Dogfooding, Crowdsourcing,
Internal employees focused on testing
In the initial days, Facebook started with Manual testing only and then slowly evolved.

During those four weeks, every day we push a new build of the app to “dogfooders” within the company (a charming phrase, coined by Netscape, which describes the process of “eating your own dogfood” --- you naturally want it to be as tasty as possible).
all Facebook staff are encouraged to try out the “release candidate” builds. That means that by the time our app lands on your phone, you can be sure that it’s been given a thorough test drive.

Outside of setting the expectation that the individual engineers and their teammates are going to test their particular changes, we also put huge emphasis on "dog fooding" (see changes to the site for up to a week before the general user will see the changes. This means that testing the site falls on the employees using the site overall. We all pride ourselves on finding and filing bugs that we find as we use Facebook for our on purposes. Every FB employee uses the site differently, which leads to surprisingly rich test coverage on its own.

There is also a swath of testing done manually by groups of Facebook employees who follow test protocols. The results (or I should say issues) uncovered by this manual testing are aggregated and delivered to the teams responsible for them as part of a constant feedback/iteration loop.

Culture of testingInitially less, now built into Engineering process We started from a position of not really having a culture of testing, but that’s changing over time as people see the value in the existing tests we have.
Testers and coding skillsHigh coding skillsOne refrain I hear occasionally is that knowing how to program will somehow “damage” a tester, because they understand how the software and machines work. My view is the exact opposite: Knowing how something works gives better insight into potential flaws. Essentially, I think that understanding how to code widens the set of tools available to a tester, without diminishing what they can do in the slightest.
Focus on QualityHighOn the other hand, we deeply respect the people who have chosen to spend their time on Facebook. One of the mantras of our release engineers is that no release should ever leave a person worse off than they were before.
Approach to Defect PreventionIntegrated with Development processWe also have an extremely robust Lint process that runs against all the changes an engineer is making. The lint process flags anti-patterns, known performance killers, bad style, and a lot more. Every change is linted, whether it's CSS, JS, or PHP. This prevents entire classes of bugs by looking for common bug causes like type coercion issues, for instance. It also helps prevent performance issues like box-shadow use in mobile browsers which is a pretty easy way to kill the performance of big pages.

Monday, August 10, 2015

Coming soon...Corporate World Without Managers

Having experienced the management role for quite some time, I believe it was important to write about the trends that are impacting the profession. The purpose of this blog, however, is not to justify which situation i.e. with manager or without manager is right or wrong. More than passing judgment, I would rather try and paint a picture as it appears in my mind and give ample space to your comments to chart the future course.
I got to write the upcoming text as a proposal for the presentation in the upcoming Grace Hopper’s conference. I would be happy if it does get selected for presentation as I do have a relevant perspective to share but if it doesn’t for some reason, honoring my passion around the subject, I would continue to expand on this by the medium of this and other blogs. Please read on and do share your comments-                      

For those of us who have worked in organizations for years would appreciate that when we think of work, we are often loaded with some peculiar assumptions-as these examples state-
·       Work is a place where employees need to often commute every day to conduct business.
·       There will be a dedicated, fixed seat where we conduct various aspects of our duties.
·       An employee will be a part of well-defined hierarchy in the organization.
·       An employee will report her work to a role called as Manager.
·       My manager will not only oversee the work but also be responsible for employee’s well-being in the organization while taking care of responsibilities like work evaluation, salary hike, promotions etc.

·       …and many more

These aspects and many more like these have been so ingrained in our minds that we rarely question their relevance in today's world.
However, we do have some outliers who are challenging these oft-believed notions. As an example- Citrix Inc. armed with its state-of-the-art technologies and a compelling vision is challenging the notion that "Work is a place". On the contrary, its solutions help promote the premise that "Work isn't a place. It's a thing you do." And you do work where you find inspiration and office is just one of many places where you may find inspiration.

In the sphere of management, there is an interesting idea taking shape these days. I think it will be too early to call it a trend yet but it still holds a great deal of promise to catch the attention of the bigwigs from our industry. This idea even has a name and it is called as Holacracy. As Wikipedia defines it
Holacracy is a social technology or system of organizational governance in which authority and decision-making are distributed throughout a holarchy of self-organizing teams rather than being vested in a management hierarchy.
Zappos, the online shoe and clothing retail subsidiary of, was in the news recently for fully embracing Holacracy and formally doing away with the manager role in the hierarchy with a strong emphasis on the principles of self-management.

Does it mean that we are staring at a future where managers won't be needed at all?

Before I further comment on this, I wanted to share some of the key events that I have seen happen in the last about 5 years or so- which have had a direct or indirect impact on the way management is done and is perceived by practitioners.

Employees First Customers Second Management Philosophy:
First event I mention here is the evolution of Employees First Customers Second (EFCS) management philosophy. This was popularized by HCL CEO Vineet Nayar during the early part of the current decade. His work and the transformation that he brought in HCL is well recorded in his first book- Employees First, Customers Second: Turning Conventional Management Upside Down . At the core of his philosophy, Vineet further narrates-
We create value in one very specific place: the interface between our HCL employees and our customers. We call this the “value zone.” Every employee who works in the value zone is capable of creating more or less value. The whole intent of Employees First is to do everything we can to enable those employees to create the most possible value.
The greatest value in a knowledge based organization is brought about by the employees who work on the stuff that directly impacts the customers. It is vital for the organizations to have clarity on where the core value zone lies.  In EFCS approach, the traditional hierarchy that is followed in the organizations where an employee is accountable to her manager isn’t considered as effective in today’s knowledge based organizations. In other words, management is as accountable to the people in the value zone as the people in the value zone are to management.

Second event that is worth noting is the organizational shift towards delayering and the organizations adopting Hourglass structures. Delayering, simply put is, the action or process of reducing the number of levels in the hierarchy of employees in an organization. Hourglass organizational structures, well, look like hourglass rather than traditional pyramid type structures. What it means is that the structure will be heavy at top, heavy at bottom and very lean at the middle. Organizations like Wipro, which traditionally has had hundreds of thousands of employees are mulling to embrace hourglass like structure, which would eventually mean that the traditional managerial type function- which mostly "ensured" that work gets done rather than "doing" work will likely be delayered.

Renewed Performance Management:
The third event, which is again gaining momentum in the first half of current decade is the revamp of performance management. Most recently, Accenture abolished its decades old rankings and the once-a-year employee evaluation process and has begun the process of replacing it with more meaningful and periodic evaluation system. Incidentally, Accenture is not the first organization to do so as companies like Adobe, Microsoft, Juniper have already replaced the older systems.

Recent Technological Trends- SMAC:
The fourth event, is a technological wave- smartly encapsulated in this  acronym- SMAC. The advent of Social, Mobility, Analytics and Cloud technologies are redefining the jobs and roles as we have traditionally known. As a simple example, the messaging service- WhatsApp's android application recently reached 1 billion downloads. As much as this number is baffling, it is more baffling to know that this app was built by the team of just four people. The future of workplace hovers around extremely lean organizations.

Millennial revolution and the Open-source movement:
Though not necessarily in last 5 years or so but there are couple of more events that are indirectly impacting the management profession. One is the trend around rise of millennial employees. By definition these represents the young workforce typically born between 1980 and 2000. This population, which will be more than 50% of the workforce in few years according to some statistics, is bringing about a change in organizations. General characteristics of these folks is that they value transparency, freedom, accountability, responsibility but hate micro-management and stay away from politics of any kind. They naturally don’t appreciate traditional hierarchical structures, which indirectly influences the role manager should play in organizations dominated by millennial. Second, not-so-recent trend is around that of Open Source movement. The grand success of the projects such as Wikipedia and Linux- both of which were built by self-managed groups of hundreds of thousands of users really gives weight to the fact that it is no longer mandatory to have a traditional hierarchy to build world-class products.

Looking back, If EFCS brought the focus back to the value creating employees, the delayering phenomenon ensured that unnecessary management layers were optimized. If disbanding the age-old performance management systems realigned the role of a manager, the technology wave of SMAC, while ensuring leaner organizations took the focus away from the traditional managerial roles. At the same time, workforce dominated by millennial population is slowly but surely changing the rules of management while bringing to the fore self-management principles of the open source movement.

These trends and the resultant impact on the management profession makes us see the need behind Holacracy more clearly. To get it right, Holacracy doesn't mean throwing hierarchy out of the organization and taking decisions only via consensus. Holacracy is also dependent on structures, processes and practices. The typical tasks of management doesn't necessarily go away but they become more distributed, more decentralized.

I will probably just stop here and ask- What is your take on this topic ? Will the traditional manager role cease to exist in the organization of tomorrow ?

Images source: