Mike Bland


The specific tools the Testing Grouplet, Testing Tech, Build Tools and others developed to improve testing development and efficiency at Google

- Barcelona, Quincy (pronounced Quin-zee), and Boston
Tags: Build Tools, CJ, Go, Google, TAP, Testing Grouplet, Testing Tech, grouplets, programming, technical, websearch, whaling

This is the fifth—and, finally, final—post in my “whaling” series about the high-level conceptual and cultural challenges the Testing Grouplet and its allies faced, and the knowledge and tools that eventually spread throughout Google development that removed the infamous “I don’t have time to test” excuse. This post discusses the specific tools the the Testing Grouplet, Testing Tech, Build Tools and others developed to improve development and testing efficiency and effectiveness.

The first post in this series focused on high-level cultural challenges to the adoption of automated developer testing that emerged as a result of day-to-day development reality. The second post in this series focused on the fundamental object-oriented programming issues which formed the core of most of Google’s testability challenges—and solutions. The third post in this series covered the basics of how automated tests should—and should not—be written. The fourth post in this series described the collection of processes employed by Google for ensuring software quality—including, but not limited to, automated testing.

As with the previous posts, the discussion of tools that follow will be somewhat limited by my own experience and memory. There are many more testing and development tools at use within Google, and I’m sure more have been developed since I left. As always, I’m aiming to provide the big-picture view of the Google development environment by pointing out the details with which I’m personally most familiar and perceived as having the most impact. If I left out a favorite tool, or didn’t discuss it to the extent you’d hoped, apologies in advance. Feel free to fill in the gaps I’ve left by commenting on my Google+ stream or publishing posts of your own.

Fact-checks and embellishments from Googlers present and past welcome via email or Google+ comments, as always.

What’s the right tool for the job?

Any tool should be judged on whether it solves a real problem, and whether it produces value of greater proportion than its coefficient of friction—in terms of its learning curve, its speed of execution, its reliability, and its results. I’m fond of paraphrasing Saul Alinsky1, who asserted that if people don’t have the power to solve a problem, they won’t even think of trying; whereas if they have the power, they’ll do the right thing for themselves willingly. Which tool you need to improve the quality of your code, product, or system the most depends on which problem is impacting it the most—in what way do your team members feel most helpless to improve quality?

Blaze, Forge, SrcFS, ObjFS

As I’ve mentioned many times, and tried to illustrate in Coding and Testing at Google, 2006 vs. 2011, a large part of the “I don’t have time to test" excuse was due to the fact that the existing build system, which required using a tool to compile the BUILD language into an enourmous Makefile, invoking GNU make to compile the code using a distcc cluster, and then running any automated tests on the local workstation, was no longer able to scale. Compiling the Makefile took forever whenever a BUILD rule changed, and the rate of growth both of the development organization and the codebase put increasing load on distcc. Checking out a new project from Perforce or synchronizing an existing one often provided the opportunity for honing one’s espresso-making and foosball skills—especially for smaller offices outside of Mountain View. Just building one’s project took longer and longer and longer; running tests on one’s machine on top of that really threatened put the brakes on one’s flow and productivity—especially given how large, slow, and flaky many of the tests of the time were.

So when the company-wide survey revealed that people felt they didn’t have time to test, we didn’t write them off as careless, ignorant bozos. We looked for solutions to the problem. Certainly better training and education regarding testable designs and powerful testing frameworks and techniques was part of the total solution—but far from all of it. All that training and knowledge and whatnot wasn’t worth a hill o’ beans if the tools at hand couldn’t make writing tests worth the time it took to build and execute them.

Sometime in 2006, the Build Tools team began working on Blaze (a.k.a. Bazel), a replacement for the Makefile compiler and make itself, that would parse the BUILD language directly and be smart about what to recompile by using the content checksums of source files instead of their timestamps, and persisting the dependency graph and the checksums of each input file between invocations. That same year, development commenced on SrcFS, a Bigtable-backed cache for Google’s enormous Perforce depot coupled with a FUSE file system on individual developer workstations for downloading only the exact input files necessary to build a particular target.2 SrcFS provided an enormous relief for the Perforce server, and for the developers who otherwise would go for days or weeks without sync’ing, and would then have to suffer the consequences of waiting so long to integrate their changes with whatever had changed since the last sync.

Out-of-band from official Build Tools efforts, Ambrose Feinstein began toying with an idea in his 20% time to execute build actions—compiling, linking, and executing tests—in parallel, using otherwise-idle machines in Google’s production datacenters. He called this system Forge. He got it working using a modified version of the make wrapper used throughout Google at the time, but was lobbying Build Tools to allow him to integrate support directly into the then-unreleased blaze tool. Eventually, he got his wish.

By August 2007, SrcFS was on the verge of being production-ready and Forge was a viable proof-of-concept, but Blaze was still a few months away from release. It was under these circumstances that I organized the Revolution Fixit, which officially rolled out all three tools on January 31, 2008. I’ll discuss the Revolution in-depth in a future post. But in the course of organizing the Revolution, those of us organizing and supporting the Fixit had one meeting with Google’s tip-top developers, including Rob Pike, who, in response to concerns about the load that shipping compiled object files and binaries back and forth, to and from every developer’s workstation would place on the internal network, tossed out the idea of having a caching system for object files similar to SrcFS—“ObjFS” he called it. About a year later, ObjFS was released. Building programs and executing tests got even faster, since the vast majority of compiled objects and programs, tests included—particularly those from continuous build systems—now were never even sent across the network to the developer’s workstation!

By 2009, slow tools were no longer an excuse for not writing automated tests. Eventually, “I don’t have time to test” was something you just didn’t hear anymore.

TAP and Sponge

The idea for the Revolution was spawned when I tried setting up a Chris/Jay Continuous Build using SrcFS and Forge, during the course of my first Test Mercenary engagement. The Revolution, in turn, inspired my boss at the time, Test Mercenaries and Testing Technology manager Mark Striebeck, to conceive of the Test Automation Platform, or TAP for short. TAP, which officially rolled out on March 4, 2010, during the TAP Fixit—which Mark and my then-manager, Mathieu Gagne, allowed me to organize—took full advantage of the power of Blaze, Forge, SrcFS and ObjFS to perform continuous integration and testing on an unprecedented scale. Every single change submitted to Google’s Perforce depot is built and tested, and only those targets affected by a particular change are built and tested. What’s more, setting up a TAP build requires filling out a short web form, as it is a centrally-managed service as opposed to the Do-It-Yourself-style Chris/Jay system, which frequently required a significant chunk of developer time to maintain.

Long before TAP launched, Mark and the Testing Technology team delivered Sponge, the centralized repository of all build and test results for every build and/or test invocation executed by every developer and every continuous build throughout the company. Yes, that’s a damn lot o’ data. But yes, they did it, and it was integrated into Blaze relatively early on. Sponge records which user, on which machine, using which commands and options, building which targets, and running which test, leads to a particular result, which is easily accessible and navigable via a web interface. Sponge put an end to many frustrating on-line and off-line conversations about who executed a command, what exactly was the command, what exactly changed, which machine was this on, etc. Each blaze invocation produced a unique URL pointing to its results, which could then be passed around and pored over by whoever was engaged, where all the relevant information was accessible without anyone having to ask for it. “Sponge link or it didn’t happen,” quoth Michael Chastain.

Fixit killers

Something that TAP and Sponge did which I don’t think many people realize is that it eliminated the need for lots of Fixits. Why? Because certain classes of problems that affected a large portion of the codebase—particularly broad, largely mechanical refactoring3 changes—could be handled by a small group of developers, sometimes even a single developer, in a reasonably small amount of time. Given TAP’s power to build all affected targets quickly, and Sponge’s ability to provide exhaustive detail for every build result, a single developer can rapidly make a series of changes that affect a very large number of targets and be sure that they’re good without involving too many other developers in the process, outside of code review approval.

In fact, I worked on such a series of changes immediately before my sabbatical in June 2011, to convert a core indexing protocol buffer (Google’s extensible data description format, described below) to a newer format, in order to break through some obstacles that the older format was beginning to present for my current project.4 For years, many others had lamented the difficulty this “legacy” format presented, and ugly hacks were made around it, since updating this particular protobuf seemed too heavyweight and risky an undertaking. However, by the time I tackled the problem, several others had done a bit of work over the years to make other necessary changes to update protocol buffer definitions upon which the “big one” depended; and with the new build tools, plus TAP and Sponge, I could spend a few days, by myself, making the big push to finish the conversion and yet remain secure I wasn’t taking down the core business, or at least the productivity of the developers dedicated to it. When the completed conversion revealed other issues about which some very senior websearch developers were concerned, I then spent a couple more days working on another set of changes, using the exact same method, to resolve those.

Everything worked. One of the first things I did when I returned from my sabbatical three months later on Monday, September 12, 2011 was look to see if anything had been rolled back, or otherwise fixed, in my absence. No rollbacks, no fixes.5


Rosie is another Fixit-killing tool, developed by Collin Winter to automate the breaking-down of very large, Google-wide mechanical code changes, distributing code reviews for each piece, and automatically submitting each smaller change upon code review approval. When deprecating an older piece of code in favor of another, this kind of tool makes updating all the existing code and eventually eliminating the deprecated code soooo much easier. In fact, before Rosie’s introduction, there was a very successful Deprecation Fixit run by Vincent Vanhoucke, Kevin Bourrillion and others; now, I don’t know that there’s ever been, or ever will be, another one.

In my protocol buffer example, while the changes were mechanical, each one was very specific, relatively limited, had significant downstream dependencies, and needed to be submitted in a particular order. But when I needed to update users of Pyfakefs (discussed below) to a new import interface in advance of making it open-source, I needed to make hundreds and hundreds of nearly-identical changes that could be submitted in any order. I wrote a script to automate the changes, used the command-line Code Search interface to find all affected files, created a single Perforce client to open and edit all of these files, and ran the script to update them all. I then produced one giant, all-in-one changelist, which Rosie then broke down and then ran the tests on each piece. Mondrian (also discussed below), which could display the build and test results for each individual broken-down changelist via its Sponge integration, made it easy to see which specific pieces of the automated change needed massaging by hand. Once those were fixed, I could ping Rosie to continue with sending those changes out for review.

Once all those changes were approved and submitted, I synchronized my original client, performed a code search, and then added to the “master” changelist any new code that made use of the deprecated interface that appeared during the course of the initial round of reviews. I’d repeat the cycle over and over until no existing deprecated usage remained. After that, I pulled the trigger to delete the old interface, and all was happily ever after.

I was a one-man Fixit, and I couldn’t’ve been happier. The benefits of these Fixit-killers are manifold:

  • Fewer Fixits means less “Fixit fatigue”, whereby developers become desensitized to, if not outright hostile towards, frequent requests to stop what they’re doing and focus on some other problem.
  • The effort of a single developer or a handful of developers, applying sufficiently powerful tools, is a much more efficient use of time and other development resources than a brute-force Fixit.
  • Because very few developers are involved relative to a Fixit, there’s little need for advertisement and coordination of effort, which makes the process run much more quickly and smoothly.
  • The Fixits that do happen, then, usually involve solving genuinely challenging problems, gaining interesting new knowledge, or adopting powerful new tools, rather than just slogging through gruntwork.

Fast dynamic loader

As mentioned above, prior to Forge, tests were executed locally on the developer’s workstation. The compile steps were handled by distcc clusters at that time, but all linking and test execution remained local. Plus, all C++ unit test binaries were dynamically-linked. This saved a bit of time when linking the binary—static linking, as explained below, used to be another massive pain point—and it makes a lot of sense when trying to iterate rapidly while developing. However, that benefit is washed away by the O (n2) algorithm used by the dynamic loader to resolve symbols from hundreds of dynamic libraries at the time the binary is executed. This problem only gets worse as more and more dependencies are added to existing components, or as new products integrate more features; test binaries for larger projects could take on the order of minutes to execute before even reaching main().

My first project on the Build Tools team was an attempt to to reduce the startup time of dynamically-linked test programs using prelinking. Prelinking would allow dynamically-linked binaries to start up almost as fast as statically-linked binaries, which is to say almost instantly. Though I’d managed to build a prelinked Google Web Server (GWS) as a proof of concept, this effort eventually failed, in part because I’d had to do some nasty manual hacking that wasn’t easily automatable given the state of the build system in early 2006.

At around the same time, Andrew Chatham patched the glibc dynamic loader, ld-linux.so, to use a Bloom filter to drop this symbol-lookup process down to seconds. Though not a silver bullet, the new “fastloader” relieved one specific, excruciatingly painful symptom of the “I don’t have time to test” disease. Andrew tried to submit this to glibc, but was rejected on the grounds that it increased memory usage for every process by a couple of megabytes, an unacceptable cost to impose on all Linux users for the benefit of a single company’s development model. Since then, Google has maintained its own specially-patched dynamic loader for development use.

Gold, the Google ld

Statically-linked binaries are, naturally, much faster to start-up than dynamically-linked binaries. However, what Google discovered was that the standard Linux linker, GNU ld, was also very slow to produce statically-linked binaries. In fact, it became apparent that in many cases, it was linking that dominated the overall build time, rather than compiling, when performing incremental (i.e. not-from-scratch) builds. This wasn’t so much a drag on building and running unit tests, of course, since those were almost always dynamically-linked; but for building debug or production builds of a full system, it was really annoying.

I had the privilege of working closely with Ian Lance Taylor, linker guru extraordinaire and one of the nicest fellows ever to grok binary object formats—one of the nicest fellows ever, really—during my “prelinking” phase. Either during or shortly after that time, Ian began writing a new linker, from scratch, focused solely on the ELF format and consequently less complex than GNU ld, that could statically-link binaries much, much faster. By 2008, not only had Ian followed through and written such a linker—he called it “Gold”, for “Google ld”—but Google released it as open source, as part of GNU binutils. Ian also published A New ELF Linker, a detailed technical article describing Gold’s design and performance characteristics.

C++, gUnit, gMock, and Saint Zhanyong

Zhanyong Wan nearly single-handedly brought modern unit testing and mock object (!!!) technology to C++ within Google via gUnit and gMock—both open-sourced as Google Test and Google Mock, respectively. He routinely made the impossible possible—or, at least, he made the at-one-time aggravatingly time-consuming and brittle tasks easy, productive, and fun! On top of that, he maintained first-rate technical documentation for both frameworks, which is largely replicated in the open source projects (googletest docs and googlemock docs). Those documents provide such wonderful and thorough examples and explanations, I won’t profane them by attempting to summarize them here; just go read them for yourself.

It’s difficult to overstate the magnitude of Zhanyong’s contribution to the adoption of automated testing amongst C++ developers at Google, hence my proposal that he be canonized. If you program in C++, do yourself, your colleagues, and your company an immeasurable service and adopt Google Test and Google Mock now.

Guice dependency injection framework for Java

Much has been written about Guice, and it is extremely popular with Java projects at Google—but, having been primarily a C++ cat, I’ve never touched it! (Well, maybe I did once, but it was so long ago I’ve forgotten.) So, I don’t have much to say about it,6 other than that it seemed to help a lot of Java projects inside Google become more testable, much in the same way that Google Mock inspired greater application of the dependency injection principle amongst C++ projects. Fellow ex-Test Mercenary Christian Gruber is the current maintainer of Guice at Google.

Protocol Buffers

Protocol buffers are Google’s standard structured data description implementation. They’re not a testing tool per se, but they make writing programs and tests very easy as far as data types are concerned. They were invented to solve these primary problems7:

  • Extensibility/backward-compatibility: Protocol buffers avoid complex code that needs to perform a lot of if (version >= 27)-style checks in client code. When new fields are added to an existing protobuf, new code can take advantage of them by checks such as if (foo.has_bar()), while existing code—i.e. existing servers running in production—silently and safely ignore them.
  • Efficient, reliable parsing: Protocol buffers use a binary format that is very efficiently parsed by both client and server code generated by the protocol buffer compiler. The same tools can also emit and parse text-formatted protobufs. While the data isn’t “self-describing” a la XML, protocol buffer definitions are very easy for humans to understand, the text format is quite readable without all the XML-style open/close tags, and the binary data is much faster to parse. Inside Google, there are tools for translating protobufs stored in binary format into text and performing moderately complex queries on collections of stored protobuf data.8
  • Efficient transmission and storage: By the same token, encoded protobuf data is far smaller than XML-encoded data and other text-based formats—in both text- and binary-encoded protobuf forms—requiring far less network bandwidth and disk space.

With the introduction of gMock, as mentioned above, testing code that makes use of protobufs became much, much more convenient, due it its built-in support for protobuf matchers.9,10

RPC testing with Servicemocker

Remote Procedure Calls (RPCs) are the messages passed between different processes running on the same or different machines that make large-scale distributed computing systems possible. Nearly all the things I’ve mentioned in my blog about programs talking to other programs, clients relying on servers, servers responding to clients, etc., imply the use of Google’s highly-efficient internal RPC implementation, which relies upon protocol buffers as mentioned above.11

Direct testing of RPC-handling code was historically very awkward to test, requiring one either to launch new server processes using canned data—pushing one’s test into the realm of at least medium size, if not “large”—or writing a mock, stub, or fake implementation of the “remote” service by hand. Piotr Kaminski changed that with his servicemocker framework. Now, one could set up expectations, responses, and error conditions that made direct use of the underlying RPC framework just as easily as one could use gMock to do the same for most other objects. One could in theory use gMock itself for the same purpose, but servicemocker proved a very elegant, robust, general solution for the very specific domain of RPC issues, reducing the friction of testing code that directly interacts with the RPC system to its smallest possible extreme.12

I’m not sure whether tests using servicemocker should be classified as “small” or “medium”. It makes use of the actual underlying RPC subsystem, implying that such tests are essentially integration tests, indicating a “medium” size; but it allows one to easily write a large number of thorough, concise, stable, fast-running tests that tickle many dark corners of the code under test without launching other processes, which would suggest “small”. In the end, it doesn’t really matter. Pick the one with which you’re most comfortable, for whatever reason you choose, as in this case, the arguments could go either way. The world is analog; computing is a digital interpolation of reality.

Of course, other tools exist for using canned protobuf data to perform larger-scale RPC testing and load testing. But I never used many of those very much; my team on websearch would just push binaries into our staging datacenter, let them process real-world data, and monitor the performance. But if we didn’t have the luxury of our own staging area, I’m sure I’d’ve become much more conversant in those other tools.

Google has a lot of code. As mentioned in the previous “whaling” post, Google has tools to navigate this codebase internally, including an internal version of the now-publicly-defunct Code Search. This tool is invaluable when it comes to discovering how parts of the codebase your own project depends on really work, to finding bugs in other projects, to contributing to other projects in material ways, and to finding users of code that depends on yours in order to make sure your changes don’t break others—whether you’ve a handful of users, or a significant fraction of the company, meriting a one-person Fixit as described above. Using Code Search, and thanks to code reviews and coding standards, it’s relatively easy to dive into a project with which you’re completely unfamiliar—at least if you’re familiar with the Google style guide for that language—and begin to make sense of what’s going on.

It’s also really handy for finding example usage of interfaces you intend to use, or examples of tests using such code, or just examples of coding and testing in general. Want to see what JeffAndSanjay or Ken or Rob Pike are up to, and what their code looks like, complete with cross-references to other code and the complete change history? It’s just a quick code search away. Talk about a company perk! I could’ve done without the cafés and microkitchens and holiday parties and other emblems of corporate excess, but you’d’ve had to pull tools like Code Search out of my cold, dead hands.

Though the public Google Code Search site has officially shut down, in researching this section of the post, I discovered that Russ Cox wrote an excellent public article on How Google Code Search Worked. What’s more, he published a single-machine implementation of Code Search—and, being a core member of Google’s Go team, he wrote it in Go.


Buganizer is Google’s internal, web-based bug-tracking tool. “Bug-tracking” is a bit of a misnomer; Buganizer is used to track defects, yes, but also feature requests, feature rollouts, binary releases to production, and who-knows-what. It’s really a full-fledged “issue-tracker”, and there’s benefits to using a standard tool for nearly everything that one might categorize as an “issue” worth tracking. It allows for cross-references not just to other issues, but to specific code changes (via Perforce changelist numbers) somehow related to the issue; at the same time, Buganizer numbers could be embedded in a Perforce changelist description. These cross-references were, of course, hyperlinks, so one could bounce between Buganizer and Mondrian (described below) with ease and comfort.

I don’t remember much of the history of Buganizer, but I do know that my partner-in-crime Antoine Picard13 was the manager of boht the Buganizer and Mondrian teams for a while.


Mondrian, Guido van Rossum’s Google starter project, is the standard Google web-based code review tool. Before Guido joined—which was only a few months after me—code reviews were done using text diffs over email. Mondrian—so named due to Guido’s nationalistic affinity for Dutch artists—allowed for the full contents of changed files to be observed side-by-side, with differences highlighted in color, and allowed the code review participants to make comments on a specific line of code by clicking on it to open a text window. The default view is a dashboard for the logged-in user, showing that user’s changes awaiting review, the reviews waiting for the user’s approval, and all other recent code reviews the user completed or participated in as a non-approving commenter.

As you can imagine, once Mondrian gained a little traction, it quickly became the standard tool. And, as Google grew, it began to strain under load, along with the rest of the build tools of the time. When Collin Winter joined Build Tools, he did a fine job of updating Mondrian to scale its performance, as well as extend its feature set and integration with other Google internal tools. In particular, Mondrian’s integration of Sponge test results, described below, became invaluable in validating that the author of a change actually ran the tests affected by it, and that they passed.

As I mentioned before, code reviews invaluable practice that, I believe, was one of the largest contributing factors to Google’s early success before the adoption of automated developer testing across the development culture. Guido and Mondrian came at just the right time, as Google’s development organization and its portfolio of products really exploded, allowing the code review process to be carried out much more efficiently given the increasingly-distributed nature of Google’s development teams.

If you’re a programmer, you should institute a policy of frequent, mandatory code reviews if you’re not doing them already. Rietveld is Guido’s open-source port of Mondrian, and other web-based, open-source code review tools exist as well. But if you don’t feel like messing with formal code reviews, a lot of folks get a ton of mileage out of pair programming. Or you could just make sure everyone eats lunch together, or you could create an environment that encourages folks hanging out and bouncing ideas around during breaks (e.g. a microkitchen area). Either way, you should get your team regularly talking about code, spreading knowledge and ideas and expertise; it’s good for you, it’s good for your colleagues, it’s good for your code, your product, and your company.


Pyfakefs started life as a utility I wrote for a handful of tests for a Build Tools project I was working on, written in Python. One of Python’s strengths is its ability to easily read and manipulate files, and it’s used very heavily inside Google for exactly such scripting (plus some production services, though not as many as C++ or Java). I wanted to test how my Python modules handled a number of different cases that involved reading data from disk, without going the route of writing a disk abstraction layer. Besides, if I wrote a disk abstraction layer, I’d have to test that anyway.

One of the nice features of Python is that it’s easy to just create a class or module implementing the same interface as a built-in, and just patch it in.14 So I hacked together some modules to replace the built-in open() call and the os and os.path modules. By using an in-memory fake file system, not only could the test run faster and qualify as a “small” test rather than “medium”, but the test data for a use case was associated directly with the code under test, rather than there being a scattering of small test files in a testdata directory somewhere, all mingled together. It allows one to test most, if not all possible corners of a data-processing function, class, or module with a minimum of friction.

It worked like a charm, and I wrote one of the early Testing on the Toilet articles—I think it was Episode 12, in September 2006—advertising its existence to Google-at-large. Over the next few years, I kept getting a trickle of code reviews for it, requesting that this feature or that be added. After a while, I actually got a little annoyed that people were making it so complex, as that was never my intention. But then, shortly after that realization, I began to appreciate that people were hacking on it because they found it genuinely useful—and it continued to work for everybody, because I always insisted on thorough tests.15 By the time I left Google, it was used by over 900 unit tests—including several for tools supporting the Google Web Server itself.

Several folks had requested that I open source the fake file system over the years, and in 2010, I finally got started on doing so, with help from Julien Silland. Being the busy folks we are, though, I didn’t manage to get it pushed out to Google Code until just before I started my sabbatical in June 2011. It hasn’t changed any in public since then, but I presume changes are still being made to it internally.


If Mox had existed in 2006, I might’ve used it instead of writing Pyfakefs. Mox is a general-purpose mocking framework for Python, based on EasyMock for Java, and is also very widely used within Google. It was written by Steve Middlekauff.

While Mox can be used easily for mocking some basic file system calls, Pyfakefs has evolved to the point where it’s a pretty standard-conforming implementation capable of fairly complex file operations. Depending on the nature of the code under test and the degree of its interaction with the file system, Pyfakefs might be a more comfortable fit.


Tools, and automated testing among them, are no substitute for knowledge, training, and clear thinking. Once you give people the power to do the right thing, giving them the knowledge how to use such power is instrumental in cultivating the desire to do the right thing, and consequently producing a positive outcome.

And this is where Grouplets and EngEDU come in. I’ve written much about each of these before, but Testing on the Toilet, Test Certified, Test Mercenaries, and Fixits are some of the Grouplet programs and events that went a long way towards spreading testing tools, knowledge, and good practices. Software Engineers in Test, while filling an Eng Prod rather than a Testing Grouplet role, have been able to carry on the work of the Testing Grouplet and the Test Mercenaries—particularly the Test Certified program—as permanent, integrated members of individual development teams.

EngEDU is Google’s internal training organization, with which the Grouplets have had close ties throughout Google history.16 They recruit Google develpers to produce training materials, such as wiki-based technical documentation pages called Codelabs,17 as well as Tech Talks, in-depth presentations given on internal technologies for the benefit of the Google development community at-large. They also work to develop in-house training programs, as well as hire contractors to provide training which Google does not yet provide.18

The moral: Invest in books and training materials, even if you need to self-produce. Imagination is more important than knowledge, but I’d wager that knowledge often fuels imagination, especially when it comes to raising the level of code quality across the board.


There were/are certainly many more tools I haven’t mentioned. I’ll let others chime in to fill in the gaps if they’re so moved. However, in the course of these past five “whaling” articles, I think I’ve provided a fairly broad, yet somewhat detailed overview of the Google development environment and culture, and spoken to many of the biggest challenges facing the automated testing movement at Google, and their solutions—particularly regarding those aspects in which I was personally involved.

Maybe some of the pieces of the puzzle that the Testing Grouplet and friends started fitting together make a little more sense given this context; hopefully my future posts regarding Fixits will make a bit more sense than they would otherwise. One way or another, it’s hopefully clear that we had a lot of challenges, and through years of systematic effort, trying many ideas until we found the ones that stuck, we were able to overcome them all—to Google’s great benefit.


  1. Yes, this is the footnote which inspired the earlier post

  2. I worked on the backend for SrcFS a little bit, just before I left Build Tools to join the Test Mercenaries. My contribution was further limited by the work I did organizing the March 8, 2007 Testing Fixit. 

  3. “Refactoring” is changing code in a way that preserves its function and behavior, but aims to improve code quality

  4. The newer protobuf format was a newer description format, used to generate the C++, Java, and Python code which parses and emits protobuf data. The new description format had no impact on the underlying binary “wire” format whatsoever. 

  5. Not much in the way of explicit thanks or attaboys, either. 

  6. I’ve avoided Java for most of my career. Despite some of its benefits, and the amazing tools that are available to work with it, it’s so verbose it makes my eyes hurt. By contrast, I’ve warmed up to Go—despite the fact that I’m not even a career programmer anymore—because it offers many of the same benefits without the verbosity, and requiring no more tools than a good text editor, a command line, and the go command itself. See Rob Pike’s Less is exponentially more for insight into the origin, intention, and design of the Go language, especially as compared with C++ and Java. After reading it, I must admit to feeling a twinge of pride that I’m one of the minority of C++ programmers who seems to “get” Go. 

  7. For more info and concrete examples, see the protobuf overview document

  8. Googlers, you really should be using Henner Zeller’s gqui if you aren’t already. 

  9. What?!? Where are the protobuf matchers in the open-source googlemock

  10. As an aside, I wrote a C++ library for efficient traversal of protobuf data and updating specific data fields based on their fully-qualified names—i.e. com.google.FooData.BarMember. I wrote this as part of a websearch process for validating the integrity of data imported from other indexing systems, and potentially editing some of this data by reconfiguring (rather than recompiling and redeploying) a production server. Matt Russotto and Shibiao Lin performed many incredibly thoughtful and helpful code reviews for this library, and Shibiao eventually moved it into the Google-wide contributed code directory; I’d be tickled to see it show up in the open-source protobuf library one day. 

  11. The closest open-source implementation is Apache Thrift

  12. I had hacked together a small class for writing fake RPC services, but was only a component for hand-rolling one’s own fake. After Piotr’s servicemocker launched, I neglected to use my own little class ever again. 

  13. Antoine was also my Testing on the Toilet partner/rival/nemesis, as we were neck-and-neck for a while for the title of most prolific TotT contributor. Eventually I stopped, and he has long since surpassed me. Long live the king! 

  14. Collin Winter once remarked to me that dependency injection was never a “thing” in Python circles because it was just built-in to the language to begin with. 

  15. Not surprising that there’d be no argument here, as people were updating a unit testing tool, after all. 

  16. Antoine Picard is currently a member of EngEDU, a role he seems to relish. 

  17. Nick Lesiecki, Neal Norwitz, and I wrote the Java, Python, and C++ unit testing Codelabs, respectively. I cleaned up the C++ version and handed ownership off to Neal shortly before departing Google. 

  18. Ana Ulin and I met during a Design Patterns workshop given by Industrial Logic. David Plass and I met when I gave a Test Certified tech talk in New York—and he was the sole attendee, a week after he’d started. I wonder what shape Google testing habits would’ve taken had Ana, David, and I not had such opportunities to meet, train, and share knowledge directly with folks from other areas of the development organization.