The Official Apple SSL Bug Testing on the Toilet Episode

The Google Testing on the Toilet team has published my episode about the Apple SSL bug, and I explain why this is for the greater social good.

15 Apr 2014 - Boston
Tags: Apple, Google, Heartbleed, Testing Grouplet, TotT, goto fail, grouplets

Pinch me! It’s like a dream come true! Thanks to the generous opportunity afforded me by the Google Testing on the Toilet team, specifically Andrew Trenk and Jim McMaster, I’ve contributed an actual, official TotT episode—derived from my TotT-inspired treatment of the Apple SSL bug—that’s being published in Google restrooms everywhere this very week! It’s officially known as Episode 327¹: “Finding More Than One Worm in the Apple”. (Yes, the same name as the Apple SSL bug article I submitted to the Communications of the ACM.)

I’d also like to say thanks to Chris Conway for apparently being the first to actively suggest the idea of turning my earlier treatment into a proper TotT episode, as well as to all of the folks who reviewed my Apple SSL bug slide deck and article. This truly has been, and continues to be, an inspired team effort, reminiscent of ye good olde days with the Testing Grouplet, et al.

WARNING: This announcement isn’t entirely a love-fest. I have a very small favor to ask of everyone, which I hope adds up to a very large favor in aggregate network effects—and things get heavy, seriously heavy towards the end of this post.

TotT Firsts
Seize the Moment
The Greater Good
Footnotes

TotT Firsts

This TotT episode breaks new ground in a number of ways. I’m the first “outsider” to contribute an official Testing on the Toilet episode—specifically, I’m the first ex-Googler to do so. This is the first time a TotT episode has been explicitly derived from an earlier work, with an author attribution and license notice embedded in it. This is the first TotT to use QR codes next to the ads at the bottom.²

Finally, this is the first time a TotT episode has explicitly addressed a real-world software defect, using it to show specifically:

how testing could’ve detected the bug and likely prevented it from even being written;
how the bug provides evidence of higher-level code quality and cultural issues; and
how code quality strongly depends on engineering culture.³

Yes, TotT has from the beginning aimed to address specific technical issues and influence engineering culture, but this is its first retrospective on a concrete, user-visible event based on a software bug.

On a personal note, despite the fact that I wrote a half-dozen or so TotT episodes back in the day—including the Test Certified/Test Mercenaries TotT episode (original image source)—this is the first episode that actually carries my name on it.⁴ Maybe one day yet I’ll catch up to my friend (and rival) Antoine Picard for the record of most episodes written! (Unless someone’s already out-written us both already, of course.)

Seize the Moment

In other news, I’m still waiting to hear back from Communications of the ACM. Searching for “cacm publication decision time” turned up a 2010 editorial from the CACM entitled Revisiting the Publication Culture in Computing Research in which Editor-in-Chief Moshe Vardi notes that, at least at that time, “The average time to editorial decision for Communications is under two months”.

That said: I need your help raising awareness of these issues. If anyone knows someone, or knows someone who knows someone at Communications of the ACM, I’d deeply appreciate an extra good word put in to increase the chances of getting my article published. I’d appreciate reshares of my blog posts or submissions to any other appropriate outlet. Suggestions for where to submit posts myself are also welcome. If you’re feeling especially generous, post a link to one of my articles or code samples as your status message. You could even post one of the original printer-friendly Finding the Worm or While My Heart Gently Bleeds articles in your office (and not just in the restrooms).

Allow me to make clear the reasons behind my sense of urgency.

I’m hopeful that this TotT episode, combined with getting the full article published in CACM (or another publication, should CACM decline), will really drive discussion around the Apple SSL and Heartbleed bugs, spreading awareness and improving the quality of discourse a few notches—not just around these specific bugs, but around the topics of unit testing and code quality in general. These bugs are a perfect storm of factors that make them ideal for such a discussion:

the actual flaw is very obvious in the case of the Apple bug, and the Heartbleed flaw requires only a small amount of technical explanation;
the unit testing approaches that would’ve prevented them are very straightforward;
user awareness of the flaws and their severity is even broader than other well-known software defects, generating popular as well as technical press; and
the existing explanations that either dismiss the ability of unit testing to find such bugs or otherwise excuse the flaw are demonstrably unsound.

If we don’t seize upon these opportunities to make a strong case for the importance and impact of automated testing, code quality, and engineering culture, and hold companies and colleagues accountable for avoidable flaws, how many more preventable, massively widespread vulnerabilities and failures will we see? What fate awaits us if we don’t take appropriate corrective measures in the wake of goto fail and Heartbleed? How long will the excuses last, and what will they ultimately buy us?

And what good is the oft-quoted bedrock principle of Open Source software, Linus’s Law—“Given enough eyeballs, all bugs are shallow.”—if people wear blinders and refuse to address the real issues that lead to easily-preventable, catastrophic defects?

The Greater Good

My insistence upon pursuing these issues has no basis in any sort of ill will towards Apple, OpenSSL, or either of their developers. By way of analogy: General Motors knowingly continued to install a flawed ignition switch in new cars from 2002-2006, and neglected to issue a recall, causing thirteen wrongful deaths due to sudden engine shutdowns. Apparently GM declined to change the switches in 2005 because it would have added about a dollar to the cost of each car, and updated them quietly in 2006 without changing the part number—an apparently willful deception which may lead to criminal charges.

Are articles reporting this story mean-spirited attacks on GM, or a means of holding GM (and others) accountable by presenting evidence for the sake of transparency and the greater social good?

I have worked to produce artifacts of sound reasoning based on years of experience and hard evidence—working code in the form of the Apple patch-and-test tarball and heartbleed_test.c—to back up my rather straightforward claim: A unit testing culture most likely would’ve prevented the catastrophic goto fail and Heartbleed security vulnerabilities from ever existing.

True, testing can prove only the existence of bugs, not their absence, and you can never expect to find literally every bug with unit testing. But that is not a sound excuse not to try to catch all the ones you can. Here’s a question I’d like testing skeptics to answer: How can one have more confidence in untested than tested code, especially as the complexity wrought by the number of features, contributors, and users combinatorially explodes? As the stakes get higher the more people depend on technology to ensure their privacy, the quality of their critical personal and professional business, their physical well-being, their very physical safety?

Given the extent to which modern society has come to depend on software, the community of software practitioners must hold its members accountable, however informally, for failing to adhere to fundamental best practices designed to reduce the occurrence of preventable defects—and must step forward not to punish mistakes, but to help address root causes leading to such defects to the extent humanly possible. Society deserves solutions, not excuses.

Footnotes

3 x 3 x 3 == 27. I’ve always been fascinated by that number, and not just because of the 27 club. ↩
The first TotT to use a QR code at all, if memory serves, was one suggested by Nathan York to Mark Striebeck and myself during the TAP Fixit, whereby the content was one giant QR code linking to more Test Automation Platform information. I came up with the name, “Too Much for One Toilet”, and John Penix produced the episode, embedding the letters T-A-P within the code (which its error correction could tolerate). There was also another episode using QR codes as an example in demonstrating some testing concepts. ↩
There was an episode in early 2009 which dealt with testing data in the wake of the every-search-result-is-malware bug that appeared one weekend in January 2009. ↩
Well, the first time an episode originally ran with my name on it; apparently I’d forgotten that the Blast from TotT Past rerun of my early episode introducing Pyfakefs listed me as the author. Thanks to Andrew Trenk for reminding me about that! ↩