Mike Bland

Object-Oriented Programming Revisited

The fundamental object-oriented programming issues which produced Google's testing challenges and the solutions promoted by the Testing Grouplet

- Barcelona and New York
Tags: Go, Google, Testing Grouplet, grouplets, programming, technical, whaling

This is the second of three—well, make that four, now—posts which represent (my attempt at) a coherent outline, from a conceptual perspective, of the challenges the Testing Grouplet and its allies faced, and the knowledge and tools that eventually spread throughout Google development that removed the infamous “I don’t have time to test” excuse. This post will cover the fundamental object-oriented programming (OOP) issues which formed the core of most of Google’s testability challenges.

The first post in this series focused on high-level cultural challenges to the adoption of automated developer testing that emerged as a result of day-to-day development reality. The next post will deal with test design and development processes that support the maintenance of high code quality throughout the company. The final post will discuss the specific tools the the Testing Grouplet, Testing Tech, Build Tools and others developed to improve testing efficiency and effectiveness.

Fact-checks and embellishments from Googlers present and past welcome via email or Google+ comments, as always.

Really obvious omission from the previous post

Before I get on with the business of OOP, since I prefer not to edit earlier posts, I want to repeat key points I made earlier in my Test Mercenaries post about the difficulty of measuring the impact of developer testing and Google culture’s very strong tendency to value only what can be measured. These two aspects underpin the cultural challenges of the “I don’t have time to test” assertion and, especially, the launching, recognition, and promotions priority structure. I guess, especially having discussed them at length before in the context of the Test Mercenaries, they seemed so obvious that I forgot to call them out as part of the bigger picture.

Addendum to Launching, Recognition, and Promotions from the previous post

Also, I made a claim in my previous post that implied that mangers, in particular, were anti-developer testing. This is perhaps an unfair characterization and confused memory on my part; the activity that many managers were actually most wary of was 20% time, which the Testing Grouplet and Grouplets in general perceived—rightly so—as damaging to their continued existence.1 Once the Testing Grouplet’s Test Certified zeitgeist took hold, most managers seemed quite happy to see their teams participate, particularly after the 2008 housing market collapse resulted in manual testing contractors being fired.

But in the early days, it’s not like managers, tech leads, or developers in general were rushing to support the automated testing cause. The assertion stands that as long as testing was seen to hinder rather than hasten the release of products and features, it remained difficult to convince the Google development population of its value, particularly in light of peer reviews and promotion prospects.

Object-Oriented Programming Basics for nonprogrammers

Let me try break down the concept of the “class” and other object-oriented programming principles for any nonprogrammers before launching into a discussion of the code-level problems the Testing Grouplet was trying to solve.

The essential feature of so-called “object-oriented” programming languages is the “class”, which specifies a collection of data and a set of operations or “behaviors”—technically called “methods” or “member functions”—that have access to and operate on that data. Classes are used to define “objects”, or concrete instances of a class used to implement part of a program. Think of a “class” as a design and an “object” as the product generated from that design, e.g. draft specifications for an iPod vs. millions of actual iPods.

With the “class” as a foundation, object-oriented programming has traditionally relied on three fundamental principles: inheritance; encapsulation; and polymorphism. Inheritance I’ll explain further just below; but at a high-level, it means that an existing class (called the “parent class”, “base class”, or “superclass”) can be used as the foundation of a new class (called the “child class”, “derived class”, or “subclass”). Encapsulation means that a class’s data and other implementation details are not directly accessible by code that does not belong to the class (i.e. by “users” of the class), and can only be manipulated via the class’s methods/member functions which have been specifically labeled “public”.2 Polymorphism means that a child class can be substituted wherever a parent class is expected, implying that old code making use of a parent class can be reused by plugging in new child classes.[^oop-3]

Think of how an iPod can be used with headphones, or as part of a stereo system, or plugged into a car stereo. Considering encapsulation, none of those extensions or systems care about what’s inside the iPod, and as a user, your only interaction with what’s inside is via the buttons or touchscreen. What’s more, you have several varieties of iPod from which to choose, all with the same basic interface (buttons vs. touchscreen notwithstanding)—an example of real-life polymorphism.

Inheritance actually breaks down into two concepts: Reuse of interface, and reuse of implementation. When a child class “inherits” from a parent class, the same set of methods/member functions that comprise the parent class’s “public” interface automatically become part of the child class’s “public” interface. This is a Good Thing, and is the mechanism that enables polymorphism, which enables both code reuse and the taming of complexity. However, the child class will also inherit any implementation contained in the base class, meaning the child class will “automatically” contain certain behaviors and features. This is not as desirable as it might sound at first, as discussed below.

Also, as a consequence of inheritance, changes to a parent class are automatically inherited by all child classes, regardless of whether or not any single child class depends on the change. This is one of the core concepts underlying the software ills I describe below.

Sorry, this is where the iPod analogy breaks down, as would any analogy based on literal physical metaphors. Implementation inheritance is a logical construct that, I think, can only be physically represented via computing mechanisms. Changes in the iPod interface and internals over the years resulted only in new iPods exhibiting the changes; for the analogy to hold for inheritance, all existing iPods would have had to have automatically changed as well—and that’s just crazy talk!

Abstract Interfaces and Dependency Injection

By far the biggest conceptual hammer in the testing toolbox is that of dependency injection, aka “inversion of control”. These big, scary terms only mean that any individual class within a program should not be hard-coded to depend upon other “real” production classes or code (to the extent feasible), particularly if these “real” classes rely on resources such as the disk drive, the system clock, or other programs via the network. Relying on these “external” resources—so called because they are beyond the control of the running program—makes it difficult to test behavior that depends on them, since such resources introduce at least one (and usually more than one) of the following issues:

  • Extensive, tedious, and/or delicate configuration that makes tests verbose, fragile, and hard to maintain or extend
  • Long start-up times that inhibit frequent execution
  • Unpredictable behavior leading to inconsistent—i.e. flaky—test results

Instead, the code under test should make use of abstract interfaces that enable the plugging-in of smaller, controlled implementations in smaller tests, and full-sized production implementations in larger tests and the actual production program.

For the nontechnical folks, an ideal example of an “abstract interface” is an electrical socket. On the supply side, it doesn’t matter if the electricity comes from a power plant, a gas generator, or a hamster wheel. On the consumption side, it doesn’t matter if you plug in a toaster, hair dryer, or a wall of Marshall stacks. Different shapes of outlets prevent suppliers and consumers of different voltages and current (direct vs. alternating) from being accidentally connected, avoiding damage. There’s an interface that both sides adhere to, a “contract” if you will; each side conforms to a certain physical configuration and behavior, and as long as that contract is held, things “just work”.

As a bonus, adapters abound that enable easy conversion between alternating and direct current, and from high voltages to low, and vice-versa. It requires a bit more diligence to ensure that devices are equipped to handle the adapted current, but it’s a relatively straightforward task.

The alternative would be having to wire every device into the electrical grid by hand. Not only would you have to perform careful bookkeeping to keep track of voltages and currents, but you run the risk of creating a bad connection, of not accounting for all the variables you should have. At best, things just won’t work. At worst, devices can get fried; people can get hurt.

All dependency injection is, at a high level, is taking advantage of polymorphism, is fitting a piece of software with “outlets” so that you can “plug-in” the components it depends on according to the context in which it is exercised, i.e. test vs. production. Suddenly, that service that talked to three other servers and was a nightmare to test ends up being relatively easy to test given a handful of test-only implementations that exercise the code under test without requiring elaborate and time-consuming configuration and setup. You’ve just replaced a set of coal turbines with hamster wheels, and the code under test can’t tell the difference.

What’s more, interfaces define exactly that behavior that you care about from the point of view of the code under test. The complete production implementation of a particular dependency may provide a very large number of behaviors and options; by defining the subset of only those behaviors and options a piece of code relies on via an abstract interface, that piece of code becomes much easier to understand. One is free to consider only how those behaviors and options required by the interface influence the code under test, rather than guessing about the remainder of the full set of behaviors and options provided by a production implementation. Complexity is reduced.

No interface is perfectly opaque, however; in the example of the power outlet, matching wattage and current between both sides matters, so that the supplier provides enough capacity and the consumer doesn’t try to draw too much current, and this is not enforced by the physical interface of the outlet. (How many hamsters would it take to drive a wall of Marshalls?) However, the supply-side of the plug usually has fuses or circuit breakers that provide for a graceful failure—graceful in the sense that nothing melts, catches fire, or explodes; as long as this extra-physical interface constraint is accounted for and tested, the abstraction works out as part of the larger electrical system.

In the case of software, you can misapply dependency injection and not use the abstraction to poke aggressively enough at the potential failure points of the code under test. As is the case with much in development, it’s a trade off; completeness vs. convenience. If you take a more convenient route, the burden is on you to ensure completeness of the test. But, usually, it’s a tradeoff worth making; after designing code with testability and dependency injection in mind, complexity usually becomes more tractable, and it’s easier and faster to deliberately poke at a potential failure point with a lightweight test implementation of a dependency than to fire up a full production implementation and wait for it to launch, much less complete the test. The value to be gained from conscientious application of testing strategies based on dependency injection is immense.

Different programming languages make the concept of abstract interfaces, upon which dependency injection depends, more or less obvious and natural: In C++, one can use an abstract base class with no implementation at all; in Java, pure interfaces have always been part of the language. My favorite implementation of which I’m aware is the Go programming language, in which interfaces are enforced at compile-time in such a way that they do not rely on inheritance at all; more on that in the Stop, now Go! section below. But the fact is, the technique applies in every object-oriented programming language, and through manual effort, in C as well. There’s lots of information at your fingertips if you search for “dependency injection” on Google.

So, what did widespread appreciation of this technique mean for automated developer testing at Google? It meant we could replace datacenters with mock objects, remote procedure calls with test-specific scripts, coal turbines with hamster wheels. It meant that tests could run faster, with consistent (i.e. not flaky) results, at the appropriate level of Small/Medium/Large test focus. It meant that someone who previously would claim a piece of code’s interaction with an external service was “too hard to test” would test the hell out of it, and run the test everytime he/she saved the file containing the code. It meant that an immense number of tests affected by a code change would be guaranteed to be executed by the Test Automation Platform, and those tests would provide meaningful results usually within seconds for every project in the company that depended on the change in some way.

Anti-Static

So-called “static” objects, methods, and member functions are the antithesis of abstract interfaces and dependency injection. They are baked-in production dependencies that provide no seam at which to separate themselves from the code under test and inject an alternative dependency with no physical dependency on the production resource at all. Static objects and methods became possibly the main source of friction with regards to introducing automated testing to an existing code base, at least in the Java world, to hear my fellow Test Mercenaries tell the tale.

The reasoning behind many static objects is that a large, expensive resource, such as a connection to a database or a Bigtable, can be effectively shared with every part of a program that needs it, improving runtime and memory efficiency. In the days before awareness of dependency injection, this kind of global resource optimization seemed like a Good Thing—and, in ways, it is a Good Thing from the point of view of a running program—but it had the unforeseen effect of making it difficult to test a piece of code that depends on such resources in isolation.

The problem isn’t just that there’s no abstract base class interface providing an abstraction buffer between the resource and the consumers of the resource; it was often the case that even if code that consumed the resource introduced a small, abstract interface that allowed for a test to plug in a lightweight replacement, that code might still physically depend on the production resource dependency, making the test program very large and expensive to build and execute. This was often a result of using Google-standard frameworks that provided the foundation for many applications, and which had batteries of expensive resources baked-in, making it tedious and difficult to write significant components without extensive dependency baggage. Even if the frameworks provided interfaces for swapping production resources with test implementations, tests were still burdened with excessive dependencies that were completely irrelevant to the feature under test.

The punchline is, you can have a single program-wide instance of a resource and use abstract interfaces/dependency injection at the same time, given the right program structure and, in some cases, tools. Teasing apart static dependencies can be an excruciatingly long, difficult, and painful process, but the reward is stable tests that run far more quickly, i.e. provide much greater value. That means more tests are written and executed more often, leading to a faster rate of development with greater confidence that new changes don’t break old requirements—candy everybody wants.

In the C++ land I came to know, static classes were often substitutes for C++ namespaces, and often times dumping grounds for “utility” functions that performed completely unrelated tasks, bringing with them otherwise unrelated dependencies that were then pulled into any code that needed any one of the umpteen “utility” functions so coalesced. This made compiling and linking slower and test binaries bigger, which in turn impacted the time it took a C++ test to startup and reach main(), as well as initialize some otherwise unnecessary production resources. The fortunate part was that, in these cases, the utility classes could be broken up, with some functions now belonging to proper classes that could be substituted in tests via dependency injection if so desired.3

In my tools post, I’ll talk a little bit about the C++ gMock (aka Google Mock) and Java Guice tools that made writing tests based on dependency injection much easier, which helped Google developers ease back on their use of static objects and the dreaded Singleton design pattern used so often to instantiate such static resources.

Think Small

Functions and classes don’t usually start life as very large entities. Usually functions and classes accrete girth over time, as just “small changes” are added to this here and this there, until the code threatens to collapse under its own conceptual weight. What developers often don’t realize is that a lot of these “small changes” they inject hither and thither are actually features that are largely independent of the code that surrounds them, and can be encapsulated in their own functions and classes.

It’s easier to reason about a class that does one thing than a class that does multiple things, which means it’s far easier to test, and to test thoroughly. Consequently, it means less testing for code that uses smaller classes, since such tests can focus on the behavior of the new code rather than having to cover every corner case of the class or classes that it uses. In other words, it’s easier to reason about a class that uses three other classes that have clear interfaces and well-tested behaviors, than it is to reason about a class that tries to do everything by itself.

Ideally, logically distinct operations will belong to separate, well-named functions, and clusters of logically distinct responsibilities will belong to separate, well-named classes. Smaller, separate classes and functions mean fewer dependencies on other code and other resources, which means that, as mentioned just above, it’s easier to reason about, since extraneous detail is no longer present. On a physical level, fewer dependencies means faster compile, link, and execution times, which means that it’s easier and more useful to run such tests frequently. Faster tests are generally more valuable tests, since they will report potential problems with greater speed, helping an developers spot a problem as quickly as possible and remain in a state of flow. That, in turn, maximizes productivity, since bugs are detected and squashed before they grow into adults and start laying eggs and spreading filth and disease everywhere.

Some developers are capable of adapting to this realization right away, and start writing more small functions and classes, which are then used by existing code without cluttering it up with (additional) unnecessary detail. But you may be surprised at the number of developers that, at the time, equated “more small functions and classes and corresponding source files” with “more complexity” on principle. Eventually they get it, when they realize that more classes/functions/files is not a symptom of adding complexity, but of better managing the complexity you need to add anyway; of relying on the structure of interfaces and source files to better separate concerns rather than relying on memory to keep them all straight.

Of course, there is a logical extreme to keeping classes and functions small, and time to time I and my colleagues have certainly been guilty of trying to make functions and classes too fine-grained. But thanks to code reviews, we were usually pulled back from the cliff given the frank and thoughtful feedback of our peers. However, impractically fine-grained functions and classes were not Google’s problem by and large, and those occasions where we drifted dangerously close to the extreme were few and far between.

Extraction

Following on the heels of writing smaller classes and functions to begin with is the idea of extracting smaller functions and classes from existing code. This falls within the general concept of “refactoring”, or changing the structure of code without changing its behavior or function, for the purpose of improving readability/maintainability, improving testability, and/or making it easier to add new features. With a good set of tests already in place, this can be done relatively quickly and easily, without fear; but often it’s necessary to extract smaller functions and classes to get critical behaviors under test in the first place.

There are methods for putting “good enough” tests in place around existing, untested code in order to extract smaller parts with some degree of confidence, such as generating “golden files”, or logs of output from critical points in a program’s logic that can be compared between test runs. Golden files are not a long-term solution to most testing issues, as they tend to be somewhat fragile and require some detective work to determine the root cause of any differences; proper tests using focused assertions are much more helpful. But when breaking a big, old, scary piece of code apart to get each piece under test, golden files can be better than nothing to help ensure that critical behavior is preserved.

Michael Feathers’s Working Effectively with Legacy Code has much more to say on the subject of getting existing, untested code under test, extracting pieces into more testable components. For Googlers, look up Nick Lesiecki’s Codelab on applying concepts from the Feathers book to parts of the Google Web Server. For refactoring strategies in general, Martin Fowler’s Refactoring is the canonical text.

Again, one can extract too much—in theory. Again, in practice, that was not Google’s problem, and code reviews helped mitigate such threats to code quality.

It’s all about fine-tuning the structure and details of the code to the level of abstraction required at each level of the program or system, and structure should trump detail to the extent possible. One can get too abstract and structured, but certainly most Google code at the time erred on the side of clumping too much unrelated detail and too many unrelated responsibilities together. Getting Google developers to accept the idea of more, separate classes not only meant that these new, smaller classes were relatively easy to test in a standalone context with fewer dependencies, but it dovetailed with the concept of dependency injection, and also dovetailed with the concept of…

Composition vs. (Implementation) Inheritance

Composition vs. (implementation) inheritance is a very old debate between two alternative object-oriented programming strategies for reusing existing code. Rather than reusing the code from existing classes by creating subclasses from them via inheritance, one can also just define an instance of an existing class as a private member of a new class—i.e. “compose” the new class in terms of existing classes. The new class will not inherit the interface of the existing class, but that’s not always necessary or desirable anyway. In the cases where it is, the new class and the existing class can both inherit from a common base class—preferably a pure, abstract base class in C++ or an interface type in Java—defining the interface only, without implementation—and write “forwarding functions/methods” as part of the new class that delegate to the methods of the existing class.4

Implementation inheritance can be powerful and convenient, but when subclasses of subclasses begin to form and child classes override some of the base class’s functions but not all, then ambiguity arises in the programmer’s mind about what exactly is going on when reading, writing, or using a class since it’s not terribly obvious from looking at the code. Tools can help, but only so much. When more and more classes begin to inherit from a base class and its subclasses rather than just creating their own private instances of those classes to access the specific features they need, you get a combinatorial explosion in the size of the inheritance tree and all sorts of unnecessary dependencies are dragged throughout the code base—which has not only an intellectual cost, in terms of keeping track of what’s necessary for a given piece of code and what’s not, but a physical cost in terms of the time it takes the build system to compile and link the code. And, as us USians are fond of saying, Time is Money.

As mentioned before, changes to a base class are automatically inherited by all derived classes. With an out-of-control inheritance hierarchy (i.e. graph of base and derived classes), making changes to base classes can be both expensive in terms of added dependencies and time it takes to build and test the code, and scary in terms of reasoning about the impact such changes have on derived classes. This is because as an inheritance hierarchy grows deeper, the less likely derived classes are to be tested, or tested well, due to the cost of accumulated dependencies that aren’t visible within the code itself.

This fear isn’t just related to adding new code to a base class, but more especially, to changing or removing existing code. Using composition, it’s easier to isolate and thoroughly test changes to an existing class without as much concern for breaking code that uses the class—but it’s much easier to track down and run the tests for the code that depends on the existing class, since the dependency is explicit. This makes it much easier to fight code rot, or the tendency to leave old code after it’s no longer needed, or to work around it for fear of breaking unknown contracts that might possibly depend on the existing behavior.

Then there’s the issue of multiple inheritance, whereby a child class inherits from multiple parent classes, introducing the possibility of silent ambiguity if those parent classes have similarly-named-but-otherwise-unrelated methods, or if different parent classes share a base class and have overridden the base class methods differently (the diamond problem), or if such methods are added to the parent classes after the birth of the child class. Some languages enforce some degree of ambiguity resolution from the author of such a child class, or define regular rules for resolving the ambiguity automatically, but even then it’s ugly at best and tricky at worst.

Granted, most programmers and some languages and style guides (like Google’s) avoid multiple implementation inheritance like the plague exactly because of these problems. If all parent classes are pure abstract/interface classes, however, there isn’t an issue, since no implementation/behavior is actually inherited. This can actually be a very useful thing, though it’s more common to use adapter classes to achieve the same effect.

Swinging back to composition, it’s true that it can prove a bit of a pain in that it sometimes requires more typing than implementation reuse via inheritance. But, all dependencies on the reused code are made explicit, and only the code that depends specifically on the reused code bears the burden of that code’s dependencies, i.e. there’s no combinatorial explosion of an inheritance hierarchy. By making dependencies explicit and as narrow as possible, reuse via composition saves Time and Money in terms of both programmer productivity—from the point of view of grasping complexity, if not typing—and in terms of compile and link time.

And if a cluster of classes keep getting used together in approximately the same way in otherwise unrelated classes, don’t push that reuse up the inheritance hierarchy—compose a new class containing them all that can be reused and tested in isolation itself! You can have it both ways with composition: breaking down big pieces into smaller ones, and using smaller pieces to build up bigger ones.

Judgment

In short: Pure abstract interface inheritance, good; any other kind of inheritance, bad. Yes, that’s an absolute statement, and such statements are rarely justifiable, but I’m sticking by it. In addition to all the familiar reasons cited above, I have two novel arguments. The first: I think the Go programming language has demonstrated that there’s no reasonable use for inheritance in modern object-oriented programming languages. I’ll explain why in the next section.

The second novel argument: Google C++ code in certain products circa 2007. If you want to see how toxic implementation inheritance can get, in terms of heavyweight dependencies which children inherit from parents and drag around with them everywhere they go, or complex side-effects between child-class and parent-class member function invocations, or a combinatorial explosion of the inheritance hierarchy when separate pieces of functionality are needed in some parts of the inheritance hierarchy, but not others, it’s all there on display. And this is from some of the brightest, most productive developers on the planet. That would seem to be an argument in favor of implementation inheritance, but really, it’s not; the Test Mercenaries were called in to help untangle these kind of messes for teams that felt like they need a hand turning the code back around, as teams eventually realized that the conceptual burden that excessive implementation inheritance placed on the team was slowing development and producing the fear of making new changes.

Breaking apart and abolishing implementation inheritance doesn’t solve all software ills, but it moves automated developer testing from the realm of the painful and arguably wasteful into the realm of the feasible and potentially helpful. Implementation inheritance introduces complexity, because now you’ve got to worry about the state and behavior of more than one class every time you write/invoke a member function, and dependencies between classes build up while remaining largely invisible from the point of view of any specific location in the code, requiring detective work to understand the code everytime one wishes to make a change, and making tests slow and expensive to run. Pure abstract interface inheritance reduces complexity because the parent class only specifies a physical and conceptual contract that it’s entirely up to child classes to fulfill—plus it’s central to testable designs via dependency injection.

Stop, now Go!

Back to Go: It implements abstract interfaces via “interface types” that are defined as method sets (i.e. sets of member function signatures) that conforming types must implement. These method sets are checked at compile-time against the public interfaces of conforming types, producing interface values that code depending on the interface can then use—as I understand them, just this pointer + virtual table pairs, as in other object-oriented languages, except that the compiler can infer these values in a clever, consistent way without an explicit inheritance relationship.

There are no parent/child relationships at all. To reuse a class’s implementation as part of a new class, one must use composition—known as embedding in Go parlance. (Interface types can be embedded, too.) The methods of embedded types automatically become methods of the enclosing type, while calls to those enclosing type methods are automatically forwarded to the embedded types, obviating the need for explicit forwarding functions as in C++ or other object-oriented languages. For more info, see the Go interface types spec and the Effective Go guide.

Who knew? All of the benefits of both interface and implementation inheritance—without inheritance—and with less code to read and write to boot. Plus, note that this implementation of interfaces also kills the need for templates in C++ or generics in Java, and essentially enables duck-typing a la Python and other dynamically-typed languages, but with the security and relative efficiency of static type-safety/compile-time checking. Brilliant!

The point about Go being, it is a relatively new language developed by some extremely experienced developers—these guys aren’t hurting for recognition and I won’t name drop, but anybody who cares can find out who I’m talking about very easily—and it cuts inheritance out of the equation completely. Minimizing the use of implementation inheritance to the extent possible in other languages is a desirable design goal, not just from the point of view of testability, but especially from that point of view.

The new fundamental concepts of object-oriented programming:

inheritance

interfaces; composition/embedding; encapsulation; and polymorphism.

Side Effects

One last object-oriented programming issue we often encountered was the reliance on “side effects”, or changes to the internal data of a class brought about by calling one of its methods, with no public interface for accessing that data. Controlling and verifying side effects in a test can be tricky; it’s much easier to test a class that returns some kind of value as a result of a function, or provides other functions to observe aspects of its state. Fortunately, it’s generally pretty straightforward to extract code responsible for side effect changes into its own class and test it directly, without making any changes to the public interface of the class originally containing the behavior. Still, it was a significant point of friction that we had to overcome.

Design for Testability Guide

The Testing Grouplet tried to produce a canonical document with its sage design-for-testability advice and called it—surprise!—the Design for Testability Guide, or D4TG for short.5 It never really took hold as an authoritative artifact, though the principles it outlined were indeed core to the message we transmitted more effectively via Testing on the Toilet and Test Certified. The bulk of its advice rested on the principles I describe in this post—abstract interfaces, dependency injection, avoiding statics, prefering smaller functions and classes, extracting classes with separate responsibilities, prefering composition over implementation inheritance, and avoiding side effects—plus a few extras, maybe, I think.

Is that all?

As jwz said, Software is hard. The issues aren’t that simple. What I’ve written may seem a straightforward prescription, but of course every application is different in terms of architecture and requirements, every team is different in terms of goals and dynamics, every individual programmer is different in terms of skills, knowledge, and priorities. But, cutting through all those layers and getting down to the code itself, lots of these specific problems and challenges posed by individual programs and systems with respect to testability tend to exhibit a few very common traits. It takes time and effort to untangle difficult code for various technical and cultural reasons, but the core issues and remedies still apply. Our challenge was getting all these diverse stakeholders to focus on the common issues, and the common techniques for solving them, and to get them all using the best tools to get the job done—not necessarily overnight, but effectively, with a tangible, lasting (if not necessarily easily measurable) benefit.

Also, note that I haven’t mentioned sprinkling design patterns like fairy dust all over the code base. Truth is, we did make use of design patterns heartily, but focusing on them as the essence of the solutions we prescribed would be misleading. Revisiting basic object-oriented programming principles and emphasizing the primacy of abstract interfaces and dependency injection was what enabled us to apply the appropriate design patterns in the appropriate cases, not the other way around.

While fundamental object-oriented principles were the key to solving the majority of Google’s testability issues, certainly there were other very challenging testing problems which I didn’t cover here, and which the Testing Grouplet didn’t explicitly address to a satisfying degree. Testing the correctness of multithreaded programs was considered, but never tackled in a direct, systematic fashion. Javascript testing never received much focus when I was involved, since it was a different beast from the server-side programs most of us in the Grouplet had experience with, though some Test Mercenaries had some Javascript chops and JS-dependent projects started devising their own solutions and sharing them throughout the JS community. Load testing and validating input/configuration data are two common testing needs for many products, but neither the Testing Grouplet nor the Test Mercenaries made much headway in providing general solutions—but these turned out to be perfect high-level testing issues for Software Engineers in Test who are permanently embedded in a team and get to know the product inside and out.

In the end, however, designing for testability tends to lead to improved software design and reliability overall; the same principles that make code easier to test, by their nature, are the same that make it easier to understand. Eventually developers realized that, when they had to exercise their own class interfaces by writing their own tests, they began thinking about design differently, which led to improved designs from the individual class-level all the way up to the whole program structure. The fear of making changes to existing code to support new features or fix bugs all but disappeared, and the time it took to make such changes decreased dramatically—not just because of the presence of tests, but because the code itself was easier to work with.6 They eventually understood that automated testing wasn’t about writing automated tests and bumping up code coverage numbers, but about improving the software process and design as a whole.

In the next post, I’ll talk about the design of the automated tests themselves and the other development practices that supported Google’s high level of code quality. Finally I’ll cover the specific tools that were introduced to make testing faster, easier, and more valuable. These tools and techniques helped to improve build and test execution speed, test writing and readability, test utility and feedback, and helped to test tricky code that depended on remote procedure calls or other external resources—the lower-level classes that actually had to interact with these things, as opposed to code using dependency injection to isolate themselves from such dependencies.

Footnotes

  1. As 20% time—which someone thought was a good idea, in fine Corporate tradition, to rename rebrand “Innovation Time Off“—implies one-day-a-week of development time taken away from a project, or one month out of five, or however you want to do the math, many managers did not encourage their developers to take advantage of it. 

  2. As opposed to “protected” (accessible to subclasses) or “private” (accessible only to the class itself) data or interfaces. [^oop-3]: More formally, this is defined as the Liskov substitution principle

  3. I got a tweak into the Google C++ Style Guide to try to encourage namespaces over classes containing only static member functions, but the language is admittedly obtuse, and I’m not sure it serves my original purpose of getting developers to think of namespaces differently than static classes so they’d perhaps avoid lumping so much unrelated behavior and dependencies together. Russ Rufer and Tracy Bialik were miffed at me for pushing this change and the technique it advocates, because in their view, there should always be a class in place to provide the possibility of dependency injection if desired one day. At this point in life, I’m more inclined to do things their way, but at the time I saw my way as a more “pragmatic” and acceptable solution to the C++ world. Lasting change is slow in the making. 

  4. Extra work, but there are benefits, such as making it easier to “wrap” the existing class function with added functionality, replace it entirely in the future, or combine multiple existing class implementations without introducing inheritance ambiguities. 

  5. Current Googlers, try hitting go/d4tg to see if it’s still there. 

  6. Of course, we couldn’t measure these effects directly, but the experience of fear diminished, time saved, and ease of maintenance was very real for those who adopted good automated testing discipline.