The Chris/Jay Continuous Build

The homegrown continuous integration and test system that powered the Testing Grouplet's Test Certified program at Google before TAP

21 Jun 2012 - New York
Tags: Build Tools, CJ, Eng Prod, Fixit Grouplet, Fixits, Google, TAP, Test Certified, Test Engineering, Testing Grouplet, Testing Tech, TotT, grouplets, technical

The Chris/Jay Continuous Build was a continuous integration system developed within Google by Chris Lopez and Jay Corbett sometime before August 2005 (when I started at Google). It originated as a single bash script, and was modified heavily over the years by developers and testers across Google, eventually with pieces broken into separate scripts and a reimplementation in Python by Matthew Springer known as Pybuild. It ran on a single machine, with its configuration and output stored publicly on an NFS filer accessible throughout the company. It was the successor to the Unit Test Framework, a fundamental component of the Testing Grouplet’s Test Certified program, and the predecessor to the One-Click Build and the Test Automation Platform (TAP).

Again, I’m surprised by how much I have to say about a given Google topic once I sit down and dig into it. There’s a lot of overlap with past and future posts, but none of this stuff happened in a vacuum, so it’s somewhat unavoidable to pull in bits of context from all over the place.

Googlers: As usual, feel free to fact-check me, and I’ll make changes as necessary.

Unit Test Framework
Chris and Jay
Coverage Bundles
Orbs, Traffic Lights, and Statues of Liberty
Test Certified
Tools and Testing World Tour
Test Mercenaries
Engineering Productivity
Revolution
One-Click Build
TAP
Epilogue
Footnotes

Unit Test Framework

Almost all the source code at Google resides in a single repository. Consequently, in the early 2000s, the Testing Technology team¹ developed the Unit Test Framework (UTF), a centralized service built on a small, dedicated cluster of machines that would run every test in the repository and publish the results.

Even given the relatively small number of engineers, the amount of code, and certainly the number of tests at the time, the UTF started struggling under load. Not every change could be tested in isolation, there was no way to isolate the tests needed to run for a particular change or project, and the UTF had to run all tests for all projects at ever-increasing intervals in a batch. Eventually results could only be provided on the order of one day at a time. Better than no results at all, ever, but the perceived value of the UTF, and of the tests it ran, diminished sharply. Both were seen as overhead with a high coefficient of friction, and in direct opposition to the cultural priority of writing a lot of code and shipping.² They became largely ignored, and many of the tests it ran ended up perpetually broken.

Despite this widespread perception, there were teams that very much depended on results from the UTF, and it was crucial to maintaining code coverage metrics throughout the company (as explained below). Keeping the UTF running was a difficult and thankless job, handled for years by Antoine Picard, Boyd Montgomery, Kimmy Lin, Matthew Springer, and others within Testing Technology, with Doug Landauer helping with end-of-life duties.

Chris and Jay

There was one part of the company that could not afford to ignore its tests, no matter how painful the friction: Ads. Code that touches money has to be as correct as humanly possible, as one may imagine. The tests may be slow, brittle, and hard to maintain, but that pain is nothing compared to the violation of trust and loss of reputation—i.e. loss of revenue—at stake.

Google has a long-standing tradition of letting engineers solve their own problems, even if it isn’t part of their normal project or job description.³ Chris Lopez and Jay Corbett were engineers on Ads who were firm believers in the benefits of automated testing, but found the UTF’s lag time to be unacceptably slow. They conceived of a small continuous integration system that could run on a single desktop machine, or separate instances on a set of machines. It would run only the tests that a particular project cared about and provide results on the order of minutes or hours; this would also limit the number of changes that would require investigation in the case of a build breakage or test failure.

The system would be uncomplicated enough that each team could maintain its own build or set of builds; but the system itself, its configuration and output storage would be stored in a centralized location so that:

Each machine running the system could benefit from updates without explicitly upgrading;
the web interface providing visibility into the results could be standardized and centrally maintained, relieving each individual project from maintaining it;
individual engineers could have access to the project’s data even if they did not have access to the build machine itself; and
the failure of an individual machine would not prevent access to its build’s configuration or historical data.

The tests could also be partitioned into “stable” and “golden” sets if the complete set of tests was prohibitively long or unreliable:

Stable: Tests that run relatively quickly, are the most reliable, and/or are the most important
Golden: Tests that run relatively slowly, were the least reliable, and/or are not as important

This system spread throughout the Ads organization, and inevitably began to be used by other teams within physical proximity of the Ads teams in Mountain View. When the Testing Grouplet formed, Chris was an early, active member, and he brought his and Jay’s build system with him. The UTF was still in use, but the CJ build was gaining popularity quickly.

Coverage Bundles

One function of the UTF that wasn’t explicitly handled by the CJ build was the concept of “coverage bundles”, whereby a team could register its specific segment of the source code repository with the Testing Technology team, such that the UTF would produce a weekly report showing how much of the code was exercised by automated tests, a metric known as “code coverage” or just “coverage”. There was nothing stopping the CJ build from running in coverage mode, and eventually some teams did implement this, but coverage runs take much more time to complete, and the one-week latency between coverage reports was not a concern for most teams, as more frequent coverage cycles are of marginal benefit given that weekly data points are sufficient to detect the relevant trends.

Orbs, Traffic Lights, and Statues of Liberty

Googlers are known for hacking hardware as much as they are for hacking software, so it was only a matter of time before they improved upon the standard Chris/Jay web interface as a means of monitoring build status. This also made sense on a practical level, as centrally-located information radiators are more likely to be widely monitored by a project team, given its casual, passive nature; a build system’s value is proportional to the rapidity with which a team can identify breakages and fix them, and having every engineer keep a web page open and refresh it all the time doesn’t scale.

One team of early adopters acquired a full-sized traffic light, and mapped its green light to “all passed”, yellow light to “stable passed, golden failed”, and red light to “stable failed”. Another ads team had neon signs custom-made. When Firefox plugins became the rage, Bharat Mediratta developed a Chris/Jay plugin that sat in the corner of the Firefox window at all times, monitoring one or more builds, refreshing once a minute. When Chrome was launched, a spiffy plugin was developed by Olivier Gaillard in London⁴ that used the red and green lightbulbs from the Testing Grouplet logo (coming in a future post!) sitting next to the address bar to provide overall status for any number of builds, running any number of different build systems in use within Google (i.e. One-Click, TAP, Pulse for windows, etc.).

One of my first projects when I joined Google and Testing Tech in 2005 was writing a Python script to monitor the status of a Chris/Jay build and update the color (and possibly the pulse) of an Ambient Orb accordingly.⁵ Eventually this script would go the way of Chris/Jay, getting hacked on this way and that to monitor different combinations and priorities of builds, to support a range of devices besides Ambient Orbs (including Henner Zeller’s Microorb implementation), to support other build systems (such as One-Click and TAP, mentioned below), and to multiplex between different builds and devices using a single running instance. Henner thought this script was getting too bloated, and maybe it was, but it’s worked for a lot of people over several years, while maintaining backward compatibility throughout every enhancement—another testament to automated testing, even on a small scale. Tony Aiuto now has full ownership of this legacy system.

In New York, once David Plass and Prakash Barathan started getting the Testing Grouplet NY off the ground in early 2008, one of the earliest projects was constructing an “orb” from small plastic Statue of Liberty models. The models were only about a foot tall, and the bulbs were only so big and bright, but they had that special New York attitude we were going for. David and Tony Aiuto launched into the product with gusto, experimenting with different models and circuits and bulbs, until they found a combination that worked well and started hosting after-hours orb building parties. These “orbs”, powered by my old orb script, were then handed out across the New York offices to teams that had achieved at least Test Certified Level One status. Speaking of which…

Test Certified

Test Certified emerged from the Testing Grouplet sometime in late 2005 or 2006 as a plan for engineering teams to improve their developer testing habits and code quality. The Chris/Jay build was built-in at the very beginning, from the very beginning. Two of the most important parts of automated testing are making sure the specific tests your project relies on will execute whenever the code changes, and having visibility into the results, particularly in the event of a breakage or failure; and that was exactly what the Chris/Jay build was designed to provide. Chris/Jay was already proven, actively maintained, and had become a de facto piece of Google standard development infrastructure. In addition, we required that teams set up a coverage bundle on the UTF to gain visibility into how much of their code was being exercised by their automated tests.⁶

The Testing Grouplet’s ambition was to provide an Ambient Orb to every Test Certified Level One team as a reward, and to fill every engineering office full of Ambient Orbs. Though we did procure and distribute a fair number, neither the Testing Grouplet’s budget nor Ambient’s fulfillment capability were fully up to the task. As described above, this didn’t matter so much after all, as there was a proliferation of other build monitoring solutions. The important thing is that due to the growing popularity of Test Certified and Testing on the Toilet,⁷ the Chris/Jay build became ubiquitous, and teams became increasingly conscious of their build status and fastidious about keeping it in a passing state.

Tools and Testing World Tour

By November 2006, I’d switched from Testing Technology to the Build Tools team, and had assumed co-leadership of the Testing Grouplet. The Test Certified program was up and running with a few dozen teams participating. Patrick Copeland had just assumed leadership of Test Engineering, and the Engineering Productivity focus area had yet to exist.

The Testing Grouplet had developed a standard slideshow introducing the Test Certified program, and at the same time the Build Tools team wanted to encourage developers throughout the company to try new tools to reduce the load on the Perforce source code management servers. I took the opportunity to develop a presentation on these tools with my Build Tools comrades, and proposed to my manager, Rob Peterson, that I go on a “Tools and Testing World Tour” to give both presentations in several other engineering offices. He gave it a go, and I was off to:

Pittsburgh, PA
New York, NY
Hampton, VA (well, there’s no office there; I just stopped home for Thanksgiving)
Zurich, Switzerland
London, England
Trondheim, Norway (which was eventually closed)

OK, not exactly a “world” tour, but it was my first time off the North American continent, and I’d never had a passport before, so it was pretty exciting to me. And considering that New York and Zurich were, even at that time, amongst the largest engineering offices, and regional hubs, I had a shot at reasonable coverage of large swaths of Google engineering given this itinerary.

The turnout to the talks was pretty good everywhere, as I recall, except for one: the Test Certified talk in New York. I had unwittingly scheduled it immediately after an engineering all-hands meeting, and in the largest conference room in the office, I gave the full presentation to a single engineer, a one-week-old Noogler: David Plass. Little did I know he and I would be partners-in-crime from then on, and that he would assume the TotT helm from Ana Ulin, help found the Testing Grouplet NY, and be my one of my closest Fixit Grouplet co-conspirators, in addition to becoming a good friend.

I mention this “world tour” episode not just out of personal fancy, but because the only official accolade I ever received from Google for my Testing/Fixit/Grouplet work came about as a result.⁸ Some years later, after Engineering Productivity formed and started issuing “Demy awards” (named after W. Edwards Deming , the productivity guru who advised post-WWII Japan), one such award was given to the Chris/Jay Continuous Build. I was recognized specifically as a recipient of this award because I gave the Test Certified tech talk, including the CJ build as a core component, on this trip in 2006.

Test Mercenaries

Come mid-2007, I had switched to the Test Mercenaries team, an in-house team of engineers dedicated to helping product teams improve their testing practices and code quality. Two or more Mercs would be embedded within a product team full-time for months, using the Testing Grouplet’s Test Certified program as the basis for concrete action and improvement. The team was formed after Bharat Mediratta and Mark Striebeck successfully proposed to Alan Eustace that such a dedicated team be hired to improve engineering practices and code quality throughout Google via hands-on intervention, and was officially placed within the relatively new Engineering Productivity focus area, alongside Testing Technology, Build Tools, and Test Engineering. Mark Striebeck assumed management of both the Test Mercenaries and Testing Technology.

One of the very first things the Mercenaries did once starting a new engagement was to set up a Chris/Jay build if the team didn’t already have one. Before we could engage the team to improve their testing practices, we absolutely had to have a build running the existing test suite, no matter how many failed at first. Many folks at first saw the Chris/Jay build as annoying overhead, and didn’t take fixing the build whenever it was broken very seriously. By the end of an engagement, however, the team usually began to appreciate the sense of confidence and security provided by a well-maintained Chris/Jay build.

Engineering Productivity

Sometime during the summer of 2007, at a leadership offsite, I sold Engineering Productivity on the idea of using the Testing Grouplet’s Test Certified program as the basis for having a meaningful dialogue about code quality (as it relates to overall product quality) between Test Engineers/Software Engineers in Test and their development teams. I made the case that if a team agreed to take the first step of TC Level 1 and set up a Chris/Jay build, it will automatically start saving everyone the time and trouble to find and fix bugs that are easily been caught by regularly writing and running unit tests, freeing the TEs’/SETs’ time and energy to focus on bigger, harder, more interesting problems. Eng Prod saw the value in this, and then threw its weight behind Test Certified; after that, the number of Test Certified mentors, participant teams, and Chris/Jay builds shot up exponentially over the next few years.

Revolution

On my very first Mercenaries project, while undergoing the agony of checking out the project’s code onto its Chris/Jay build machine and waiting for an eternity, I decided to try SrcFS, an as-yet-little-known system I’d worked on briefly as part of Build Tools, to avoid having to check out much if any code. Just for kicks, I also decided to combine this with Ambrose Feinstein’s Forge, his 20% project he’d pinged me about some weeks or months earlier that distributed compile steps and test executions, using spare data center capacity.

My jaw dropped. The speedup was unbelievable; a full synchronize-build-test cycle dropped from well over an hour down to about seventeen minutes. Many tests were broken, but only because the new tools were more strict about dependency declarations, and were all easily fixed. Consequently, later that week, I excitedly began planning what would become the Revolution Fixit, in cooperation with Build Tools and Testing Technology, whereby these tools (in addition to Blaze, the Make replacement) were rolled out company-wide. Suffice it to say that this Chris/Jay build was for me what the bathtub was to Archimedes.

I will discuss the Test Mercenaries and the Revolution Fixit in greater detail in future posts.

One-Click Build

Around the same time as the spark that led to the Revolution Fixit, Mark Striebeck envisioned a new system centrally-maintained by Testing Tech that would automatically create Chris/Jay build configurations and allocate a machine for each, further reducing the friction to achieving Test Certified Level 1. Though Chris/Jay was popular and gaining momentum, it had already started developing a reputation for having a bit of a learning curve to set up, and requiring constant care and feeding. For the majority of projects with straightforward build and test requirements, Mark imagined that a centrally-maintained system could reduce the setup and maintenance burden for a significant number of teams.

Mark called this system the “One-Click Build”, in reference to the vision of filling in a short web form with a single button that, once clicked, would configure and launch a new Chris/Jay build instance. As part of the Revolution Fixit in January 2008, working very closely with Mark, I’d promoted the prospect of the One-Click Build as a primary motivation for having projects adopt the new suite of build tools sooner than later. The vision also included this new system rolling out within a year.

However, once Mark began to realize the full potential of the new test suite being rolled out as part of the Revolution, he revised his plans to develop a centrally-maintained, datacenter-based system that could use the new tools to run all the tests for every single change, run only the tests affected by each change, and store all the results. This new system he called the Test Automation Platform, or TAP. This system would also involve a single web form for configuration and remove the maintenance burden of physical machines from individual teams, but the One-Click Build as originally conceived was off the table.

The Zurich office has a very strong sense of self-sufficiency, to put it positively. Zooglers don’t like waiting on folks from the headquarters in Mountain View for anything. At that time, my good friend Henner Zeller was in Zurich,⁹ and after the first few quarters passed without a One-Click Build system from Mountain View, he decided to team up with Robert Nilsson to roll his own. It differed from Mark’s vision in that it required those who participated in the OCB to add their individual build machines to a common pool, the idea being that the more folks used this system, the more machines available for everybody, and the better it would work for everyone. It also required that projects could build and pass all tests using the new, distributed development tools rolled out during the Revolution; this disqualified many projects that still had compatibility issues, but there were still hundreds of projects that could use the system and benefit from its relative simplicity. Also, this was the first serious replacement for the Chris/Jay build; it relied on a completely different infrastructure.

At the time, Henner’s sudden, company-wide announcement in early 2009 seemed a bit, well, bold, let’s say; and Mark was wary of any product that could be seen as competing with TAP, which was still a year away from release. But as it turned out, all ended very well for everybody. OCB was never seen as a competitor or replacement for TAP, but as a temporary relief until TAP could be delivered for those teams that were already compatible. When it came time for TAP to roll out in March 2010, the several-hundred OCB builds were very easily imported into TAP, already perfectly compatible, and Henner and Robert were more than happy to turn their system down.

TAP

Almost all the source code at Google resides in a single repository. Consequently, the Testing Technology team developed the Test Automation Platform (TAP), a centralized service built on a dedicated cluster of machines and the in-house distributed build and test infrastructure, that runs every test in the repository at every single change and publishes the results.

There’s much more to say about TAP which I’ll save for a future post, but at a high-level compared to Chris/Jay:

Since TAP is a centralized service, teams don’t have to dedicate engineering time to machine maintenance, which became part of the price for using Chris/Jay.
TAP’s configuration is a single web form, as opposed to Chris/Jay’s combination of bash scripts on a networked file system.
No more networked file system performance or storage space issues related to the thousands of Chris/Jay builds throughout the company.
TAP runs only the affected tests for each and every changelist, meaning project team members can quickly pinpoint breakage culprits; Chris/Jay runs all of a project’s tests all the time, with multiple (and often unrelated) changes batched together for each run—remember that single source code repository across nearly all Google projects—making breakage-forensics an art in itself.¹⁰
TAP provides easy visibility across all TAP builds, making it easy to see when a bad change breaks multiple projects and is fixed; Chris/Jay couldn’t provide the same degree of visibility directly.

An unforeseen future benefit of Chris and Jay’s design decision to post all CJ builds’ configurations centrally was that the TAP team could, several weeks in advance of the company-wide rollout during the TAP Fixit on March 4, 2010, import all of the Chris/Jay builds itself, rather than relying on each individual project team to do it, and pinpoint tests that were failing due to specific known incompatibilities. This meant that teams whose new TAP builds were passing could spend the fixit day turning down their old Chris/Jay builds, and teams whose builds didn’t work could try to fix the problems and get their new TAP builds to pass with the focused assistance of the TAP team and the large number of fixit volunteers worldwide.

Several TotT episodes publicized the upcoming TAP Fixit, each of which were dutifully proofread and edited by Chris Lopez himself. He requested that, on one episode, one of the ads at the bottom read: “Chris wants you to kill the Chris/Jay build.” (Or something very similar.)

Epilogue

The Chris/Jay continuous integration system was one of the most important and successful engineering productivity developments in Google history. Its impact on developer testing practices and code quality within Google cannot be overstated. It paved a large part of the way for Test Certified, and ultimately for TAP. It served its purpose very well for a very long time. There may be a few instances running for a few projects with very special requirements that preclude those projects from running certain tests on TAP’s distributed infrastructure, but Chris/Jay is for most practical intents and purposes retired.

It’s hard to imagine how Google would be today were it not for Chris and Jay scratching that itch years ago.

Footnotes

My first team, originally related to the Software Quality Engineering department. SQE would become Test Engineering, and eventually Test Engineering, Testing Tech, and Build Tools would fall within the Engineering Productivity focus area. ↩
Writing a lot of code and shipping is important, but it was the Testing Grouplet’s job, in conjunction with Testing Tech, Build Tools, and eventually Eng Prod, to find solutions to the problems that caused engineers to see testing as an obstacle to these goals rather than an essential habit to achieve them in the long term, as the size of the company and complexity of its products continued to expand. ↩
This is part of the idea of 20% time, and is the policy enabling the grouplets and fixits that I talk so much about on this blog. ↩
Thanks to David Plass for the credit after this entry was posted (which is why this footnote is out-of-order). ↩
The orbs of yore had a standard RS-232 serial interface that could connect to an individual machine; for CJ monitoring, with the results on NFS, this didn’t have to be the build machine itself. I can’t find the developer info on the Ambient website anymore. ↩
Chris/Jay and the UTF were available for Linux-based teams developing for production servers or browser software using Javascript. For teams working on Mac or Windows software, they could use appropriate build and coverage systems for those platforms. ↩
Ana Ulin and I each wrote one CJ-related TotT; there may have been more. ↩
Google culture strongly values awards and recognition, particularly internal awards and recognition. I did receive a number of generous performance reviews and peer bonuses from fellow employees over the years, however, and received a promotion as the Revolution Fixit started coming together. Peer bonuses are issued privately from one employee to another via the recipient’s manager. ↩
Now Henner’s in Mountain View. ↩
While in the Test Mercenaries, I started a “Build Cop Tips” page on the internal documentation system, complete with a picture of Officer Cartman. Others have since generously assumed ownership and expanded it. ↩