A History of Password Storage

The first time people attempt to design a Web login system, they will usually default to simply inserting their users’ passwords in their database. This solution is simple, obvious, and wrong.

The problem is that database storage is not nearly as private as we would all like it to be. Even the databases that we’d hope are best defended from attack keep getting dumped, and in almost all of those the most valuable thing to an attacker is the list of emails and passwords in the “users” table. With that information they can obviously impersonate any user to the compromised website and do anything that user is able to, but it’s actually worse than that: in practice, users reuse passwords all the time. Steal someone’s poorly-defended Twitter for Pets password and there’s a distressingly good chance that same password will work at their bank.

As a rule Web developers are not stupid, and once this problem is pointed out they’ll generally try to fix it. Unfortunately for all of us, any solution will require cryptography, and like any problem requiring cryptography it turns out to be fiendishly complicated, difficult to grasp, and susceptable to getting torn down by the tiniest of mistakes. It doesn’t help at all that the now common wisdom in cryptographic circles that APIs should be difficult to misuse is an embarrassingly recent development—we only have to look at GPG or the OpenSSL command line tools to confirm that!

It almost sounds like we could solve our problems by encrypting passwords, and sometimes a breach announcement will say that the stolen passwords were encrypted. We can only hope this is a case of information getting lost in translation between the developers and whomever actually wrote these announcements, because it turns out encryption is a dead end for this problem. Leaving aside the thorny question of how you keep the encryption key safe, an application that can be fooled into simply printing out a password can surely be asked to do so after it’s decrypted it.

That is a real concern, believe it or not. What we’d like, then, is some way of handling passwords that allows us to verify that the password we’ve just now been handed is the same as the password we were handed in the past, without being able to reveal even to ourselves what that original password was. This sounds like message hashing; it sounds so much like message hashing that I can’t even be mad at the long line of Web developers who concluded a message hash was the right answer and punted their passwords through the only message hash they’d heard of, which was generally MD5. But this is cryptography, the land of tiny mistakes that will ruin your entire day.

The first mistake here is that we’re not protecting a single password. We’re protecting an entire database of passwords, and if an attacker can see that our CEO has the same password as Bob from Sales, that’s bad. They can phish the less-protected target and immediately pivot to the big juicy account. Even worse, if our buddies at Contoso got hacked last month, Contoso used the same hash function, and Contoso’s CEO happened to use the same password as ours (maybe it’s “golf”, whatever), the attacker might still have that mapping of hash to password still sitting around!

The second mistake is that the set of likely passwords isn’t very large. They’re mostly short—in the bad old days, no more than eight characters—and mostly variations on dictionary words. It’s absolutely feasable to just try all the likely passwords. In previous years you might use rainbow tables—cleverly-compressed dictionaries of hash values to likely passwords—to speed up this search, but GPUs have become fast enough that serious attackers will just plug a bunch of them into each other like they’re trying to mine buttcoin.

Before we move on to how the industry fixed these problems, I should say that I don’t mean to imply that they are obvious problems. In 2019 the industry has accumulated enough scar tissue around passwords that experienced professionals can say immediately say “no don’t do that” when someone comes up with md5(password) as a solution, but even Microsoft shipped basically that, even as late as Windows NT/2000. If Microsoft, a multi-billion dollar company that can hire as many professional cryptographers as they please, can ship NTHash in the year 2000 then nobody just contemplating the problem for the first time should feel bad that they fell into the exact same trap.

To solve the first problem, where the password potato will hash to e561f9248d7563d15dd93457b02ebbb6 every time, we need to introduce something more to the process, something that can make that hash value wildly different even if we’re hashing the same value. Conveniently, high-quality cryptographic hash functions have the property that any change to their inputs, even a small one, will result in a wildly different output. We can generate some random data every time we store a new password, feed it to the hash function ahead of the password, and we’ll have a scheme where it’s as hard to tell whether two stored passwords are the same as it is to recover the original passwords in the first place.

For obscure reasons (read that as: I don’t know what they are), that chunk of random data is referred to as a “salt”. We do need the salt that we used to obtain a hash to verify a password against it, but there’s no need for elaborate or clever storage schemes: as long as the salt is unique, it doesn’t have to be secret. Storing it in the same database column as the result of the hash function is completely acceptable, and near-universal practice. Adding a short hash algorithm identifier so we can transparently upgrade users on their next login if we’ve move to something more secure, we might end up with “crypt-style” hashes, so named because modern versions of the Unix crypt(3) password-hashing function uses it: "potato" might end up looking like "$1$ytW4UcNn$FWkVBbl6SC6QHID0Igbuc1" in this scheme.

As a historical note, salts as used in password hashing first evolved as a defense against rainbow tables. Good rainbow tables required large but feasible amounts of disk space before the advent of salts, but with even eight-character salts the size of a decent table would be an extravagant waste even in the modern era of gigabit networks and terabyte drives.

Some sources discuss the use of “peppers” in password hashing: something similar to a salt, generated per application rather than per password. The concept seems attractive to some people because it doesn’t require updating the format of the stored hash and therefore wouldn’t require widening any VARCHAR columns. They were always of dubious use. In the past when rainbow tables were the dominant attack strategy they’d have stopped someone using a generic table, but the more popular an application became the more likely someone would go to the trouble of generating a table for that application, and at that point the pepper instantly becomes useless. In the modern era of hardware-accelerated password cracking they aren’t of any use at all.

Resisting brute force attacks, then. The central issue here is that cryptographic hash functions are designed to be efficient. They have to be; they have to be fast enough to be useful for verifying the integrity of multi-terrabyte database backups. We want another kind of function that’s deliberately inefficient, so that the exercise of trying all reasonable inputs will take too long or cost too much money to be of any use.

As a first attempt, we might take the output of our salted hash and feed it back into the hash function, then feed that back into the hash function again, … until we have spent as much time reapplying the hash as we are willing to spend. This scheme is called PBKDF1 (Password-Based Key Derivation Function version 1), and its useful lifetime was relatively short. Its successor PBKDF2 uses a more sophisticated construction, but the core idea is essentially the same; today PBKDF2 is the algorithm of choice when an application requires NIST-standardized cryptography, and is not recommended otherwise.

Around the time PBKDF2 was being standardized, another group of cyptographers studying the Blowfish symmetric encryption algorithm noticed that its key schedule was strangely inefficient. Aware of the problems of password storage, they extracted and lightly modified that portion of the algorithm and published it in the Usenix security journal. Their EKSBlowfish (Expensive Key Schedule Blowfish) became the core of the bcrypt password hashing routine, which has been the recommended password hash for most of my career.

Attacks, unfortunately, only ever get more effective. PBKDF2 and bcrypt focus entirely on requiring more computation, and cryptographers have watched the march of progress in GPUs and ASICs somewhat nervously. If there were an algorithm that was fundamentally inefficient in its memory requirements as well computation, that would resist the new massively-parallel attackers quite handily.

The first such algorithm to gain wide prominence was scrypt in 2009, designed to protect encrypted backups by requiring much memory to compute the encryption key. Although scrypt is well known, as these things go, it’s not used much for logins. I’m not completely sure why but at a guess it was slightly ahead of its time, anticipating a problem that wasn’t yet quite a practical concern for common applications. It’s also more complicated to use; PBKDF2 or blowfish have one tunable parameter, the “work factor”, whose effect is easily understood. Scrypt has three parameters, whose effects are slightly more obscure.

By 2013 the fundamental primitives for high-quality password storage had been widely available for years, but high profile password storage disasters continued to hit the news. In order to consolidate any recent advancements and raise awareness of the issue, a group of cryptographers organized a Password Hashing Competition modeled after the competitions that resulted in AES and SHA3. In 2015 they announced Argon2 as the winner.

The goals of Argon2 aren’t too different from scrypt. It’s still intended to require lots of time and lots of memory; also like scrypt, it takes three tunable parameters. Unlike scrypt, whose memory requirements are a function of all three parameters, Argon2 simply takes the amount of memory it should use as one of its parameters and is thus much easier for developers to understand.

Today Argon2 is the state of the art and recommended for new applications. It’s the password hash in libsodium following version 1.0.9, and in all applicable cases I can think of the correct answer to a cryptographic question is “whatever the latest release of libsodium does.” That’s the point of libsodium: after far too many years, developers don’t even really need to care.