Heartbleed and Heartbreak

The Heartbleed bug has me seeing red just as much as Apple's SSL bug did, because it's another serious bug that could've been caught by a test

09 Apr 2014 - Boston
Tags: Apple, Heartbleed, goto fail, programming

UPDATE: I’ve taken my best stab at writing a complete unit/regression test for the Heartbleed bug and I’m fairly happy with the results. You can grab it here: heartbleed_test.c

Heartbleed is the name given to the latest SSL vulnerability affecting massive numbers of Internet servers. As proven by this Reuters article, the flaw is serious enough that mainstream press outlets are running the story as a headline, just as with the Apple SSL bug.

Here’s the change that introduced the Heartbleed bug into OpenSSL: A moderately large change with not a test in sight. Note that it was apparently code reviewed. That’s a good thing, but it goes to show that code review doesn’t catch all bugs—especially if the reviewer doesn’t ask for tests!

I’m also very disheartened to see that the fix for the Heartbleed bug lacks tests as well.

I haven’t the time to really dig into this one personally—I’m totally spent after all of my Apple SSL bug work—but the following conclusion from Matthew Green’s otherwise excellent analysis of the Heartbleed bug left me seeing red:

Should we rail against the OpenSSL developers for this?

Don’t even think about it. The OpenSSL team, which is surprisingly small, has been given the task of maintaining the world’s most popular TLS library. It’s a hard job with essentially no pay. It involves taking other folks’ code (as in the case of Heartbeat) and doing a best-possible job of reviewing it. Then you hope others will notice it and disclose it responsibly before disasters happen.

The OpenSSL developers have a pretty amazing record considering the amount of use this library gets and the quantity of legacy cruft and the number of platforms (over eighty!) they have to support. Maybe in the midst of patching their servers, some of the big companies that use OpenSSL will think of tossing them some real no-strings-attached funding so they can keep doing their job.

Except for the suggestion that big companies should help fund OpenSSL development, this is an exceptionally unhelpful conclusion, and triggers the same blood-boiling response in me that the reaction to the Apple SSL bug did. This is because of the implication that holding the OpenSSL developers responsible for failing to unit test their code is tantamount to “railing against” them, that somehow the generosity of their contribution and relatively untarnished track record should absolve them of any genuine accountability for what appears to be a preventable flaw—because hey, stuff happens.

Good intentions mean nothing if your stuff is broken due to a failure to apply best practices which could’ve prevented a catastrophic security flaw. I’m not advocating “railing against” the OpenSSL developers or harshly criticizing them in any way, but let’s learn from this and fix the problem, not sweep it under the rug because we feel bad for the team that carries the virtual weight of the world upon its shoulders.

Applying unit tests to an existing code base is a solved problem! The only thing standing in the way of fixing preventable bugs like this before they ship is the cultural bias against unit testing, either because it isn’t taken seriously or is outright dismissed. This is why I think the Apple SSL bug provides such a prime opportunity to demonstrate the value of testing and tilt the culture of the entire tech industry towards expecting it everywhere, all the time. The Heartbleed bug may provide another such opportunity—if someone (not me!) would step up to seize it!

As a counterexample to Green’s post, I’ve been enjoying Sean Cassidy’s recent posts on the Heartbleed bug and the earlier GnuTLS bug. These are links to the conclusions of three different posts by Cassidy:

Diagnosis of the OpenSSL Heartbleed Bug
The Story of the GnuTLS Bug
A Difficult Bug, which had to do with a project of his own, not Heartbleed or GnuTLS or even the Apple bug

In each post, after a very lucid explanation of each bug and its fix, he consistently remarks how proper unit testing (amongst other practices) would have prevented each of them, and calls for more unit testing to be done in general. And in his Wrong Solutions, writing about a topic very near and dear to my bleeding heart, the Apple SSL bug:

Good testing will catch many and prevent more bugs. This is why it’s critical that certain projects, such as the new Python cryptography library, have 100% code coverage. This type of glaring error could not happen there.

Untested cryptography code is broken cryptography code.

The only improvement to Sean’s posts I would recommend would be for him to post working patches providing tests for these flaws. That may be a lot to ask of a blogger unconnected to the projects in question, but not too much at all to ask of the project developers themselves.