MD5 Hashes – Part 2

Padlock

MD5 Encryption?

It’s been a while since I wrote the first part of this MD5 article . Here in Part 2 I’m not (yet) going to cover the subject of hash collisions… That will follow in another future part.
Just now I want to deal with the pervasive, but wrong, belief that MD5 (or any of the other hashes commonly used, e.g. SHA1) are, of themselves, a means of encrypting data. They are not. However they often, at first glance, look like they are being used as such under some circumstances. This is a widely held misunderstanding and needs to be corrected.

What is encryption then?

Before going further it’s worth agreeing on what encryption actually is. There are numerous ways of defining it. However in essence it is the process of obscuring information such that the original meaning cannot be discerned without knowledge of a special means of revealing (decrypting) it. The key (pun intended) point here is that information get hidden and subsequently revealed again. Bear that in mind as we look at one commonly seen example of MD5 “encryption”.

Passwords

Many computers require you to log on to them by means of a user-name and a password. At it’s simplest this can be achieved by holding, in a file on the computer, a list of the user-names permitted along with their corresponding passwords. When someone tries to log on, the user-name and password they type in can simply be compared with the master list and, if it matches, they are allowed in. Nice and easy. However the weakness, even in this trivial example, is that if someone gets access to that file they can see, in clear-text, all the usernames and their passwords. This is generally regarded as a Bad Idea.

It doesn’t take much imagination to see the even greater vulnerability in such a system if the validation is actually not performed locally on the computer, but instead involves connection over a network. At its simplest, imagine I am logging in to a computer over the Internet: my plain-text (i.e. unencrypted) username and password are available for any passing traffic monitor to grab. Not good.

So we come to the use of hashes. Imagine that my username is “fred” and my password is “topsecret”. Rather than store them in plain-text on the computer we instead store the username and only the hash of the password. So if we were using MD5 hashing we store only the value ea847988ba59727dbf4e34ee75726dc3. (You can try this at this link to verify the value). Anyone who gets hold of the file subsequently will only know that the MD5 hash of my password is ea847988ba59727dbf4e34ee75726dc3 – they will not, however, have any easy means of deducing what my password itself is. When I legally log-on the the computer as “fred” and with password “topsecret”, the computer simply performs a hash on the entered password and compares it to the has in the stored file. If it matches, then the password I entered is correct.

Note, however, how this differs fundamentally from encryption: the hashed value of ea847988ba59727dbf4e34ee75726dc3 cannot be decrypted into the word “topsecret”. It’s not a case of “we don’t know the decryption key, so we can’t decrypt it” – there is no decryption key. A hash such as MD5 is, by definition, not reversible.

So why the confusion?

The reason for the confusion over “Is MD5 hashing a form of encryption?” is that, when used for something like passwords, it appears so. First let us take the counter-example: I produce the MD5 hash of a file containing 200 pages of text. As always, the algorithm gives me a 128-bit value as output. Even to the most naive it should be pretty obvious that you cannot take a 128-bit value and, by applying an operation to it, “unlock” it and have it magically explode back into 200 pages of text. However passwords are, generally, quite short, and 128-bits of, say, ASCII codes are often ample to represent them (allowing 16 characters). So the fact that the thing being hashed is “smaller” than the hash value actually output does, very superficially, resemble what happens during real encryption.

Follow that up with the fact that, referring back to my earlier example, a value such as ea847988ba59727dbf4e34ee75726dc3 – Is there any other 9 letter password which would produce that value? Almost certainly not. Hence the tempting logic of saying that if there is a direct and unique one-to-one correlation between my password and its hash value, surely that’s the same thing as saying it’s encrypted? Tempting but wrong.

Password cracking – An aside

Slightly off topic, but worth a short mention: the password file mentioned in the above examples would seem, at first sight, safe to leave lying around for all to see, no? After all, with the passwords so obfuscated by the hashing mechanism no one can make any sense of them? And, we’ve just discussed, the passwords cannot be decrypted so we’re quite safe?

Two words: brute force. Brute force, in computing, applies to any operation where if you can marshal enough computing power and/or time you can overcome the nearly-impossible. Encryption itself is susceptible to brute force attacks (there is one exception to this, but the subject of so-called One Time Pads need not divert us from the general point) and so is the cracking of passwords. If someone keeps trying enough passwords along with the username “fred” they will, eventually (after hours, days, weeks, months, etc.) stumble across “topsecret” and be in. However this will take them, or their automated attack box, many thousands or millions of attempts. We can easily guard against this interactively, for example by allowing 5 logon attempts by a user before locking his account permanently, hence precluding the thousands or millions of further attempts required.

But if the attacker has gained a copy of the password file, with the passwords’ hashes, then he can work off-line. He can “brute force” attack using the hashing algorithm until he finds an input sequence that produces the same hash value as is contained in the password file. In other words, methodically throw input to the MD5 algorithm until ea847988ba59727dbf4e34ee75726dc3 comes out as a result. When it does, likely as not the input value was “topsecret” and off he goes. For this reason the password file on a computer generally requires itself to be protected and unreadable by those who do not have the proper authorisation.

Networks

Which leads us neatly to the final point: if the password file referred to here anyway needs to be protected and unreadable by the unauthorised, then why bother hashing the passwords into it in the first place? Why not simply store them as plain-text anyway, since the file itself must itself be secure? For a standalone computer system this question has some merit. But imagine a slightly more complex (and these days potentially more common) scenario: the computer we are logging on to with our username and password does not, itself, hold the password file. Imagine it is stored centrally on some other computer which is reachable over a network. This scenario is extremely common, for all sorts of practical reasons.

If I enter my password into the local computer system we do not want it transmitting it to the remote system in clear-text. Better would be to hash it first locally and then transmit the hash value across the unprotected network to the remote system. There it could compare the received value to the stored hash value and if they match, signal back to the originator. Better, but not perfect: after all, if our hash itself goes out in clear-text over a network it becomes similarly vulnerable as our password file in the previous example was if everyone could read it. Someone could still capture the hash value and brute-force a password from it.

So let us try a slightly more sophisticated technique. If both the systems (the local system we wish to log on to and the remote system which holds the passwords file) can perform hashing, they can exchange the required intelligence between them without revealing the password or its hash in a readable form. Follow this sequence:

  1. I enter the username and password of “fred” and “topsecret” to the local system (LOCAL).
  2. LOCAL contacts the remote password authentication system (REMOTE) and says “Hi. I’ve got user fred wanting to authenticate.”
  3. REMOTE generates a random string (let’s say “wibble”)
  4. REMOTE sends “wibble” to LOCAL and says “OK, get him to hash that string”.
  5. LOCAL calculates the (e.g. MD5) hash of the entered password (“topsecret”). Using some hashing mechanism again, he applies this password hash to the challenge string of “wibble”.
  6. LOCAL sends the resulting new, one-time, hash back to REMOTE.
  7. REMOTE already knows the user’s password hash (although he does not know, nor need he, that it came from the string “topsecret”). He also knows that the challenge string sent to LOCAL was “wibble” (after all, he sent it). So REMOTE can apply the same hash to those two values and compare the result with that sent back by LOCAL. If they match, the user is allowed in.

The key points to note from this are:

  • The clear-text password is not stored anywhere.
  • The hash of the password is only stored on REMOTE.
  • The only data transmitted over the (vulnerable) network is the random (and always different) string of “wibble” and the resulting hash of “wibble” and the password hash.
  • Since the “wibble” string changes every time, so too does the result returned.

In true hashing tradition even if I know that “wibble” hashed to a known value, I cannot deterministically reverse the algorithm to find what input value (the user’s password hash) was used. If I know the algorithm being used the best I can do is brute-force out the user’s password hash, and then in turn perform a second brute-force on that. Which makes my life as a cracker exponentially more difficult.

In conclusion

This is not in any way a comprehensive coverage of the issues surrounding password security in computer systems! Remember that what we’re doing here is learning about hashing and its characteristics, not encryption nor computer system security. We’re touching upon certain techniques to illustrate points about hashing. The examples presented are limited and, today, insufficient. The last example, covering a simple Challenge and Response mechanism would last about 5 minutes in the modern Internet! In practice we needs to guard against so called replay attacks in such scenarios, along with true encryption of the hashes being passed back and forth. But that need not detract from the, hopefully useful, illustration of what hashes are and how they can be applied.

2 comments to MD5 Hashes – Part 2

  • Izzy

    You said it isnt dycrptable by definition, my friend said “I WILL OFFICIALLY +REP AND THINK WHOEVER DECODES THIS PWNS!

    If you do it, you are pwnage!

    085dc5be23fc96443fae66cb4873e7c5

    Here’s a hint, it’s an MD5 hash. ”
    I need the +rep but is there a way to decode it?
    kthanxbai

  • I’d love to help, but I’m not even clear what’s been asked! I *think* that I’m asked to “decode” an MD5 hash.

    I obviously failed to make one thing clear: hashes, by their very definition, cannot (if well constructed anyway!) be “decoded”. Hashing does not equate to encryption!

    If the hash supplied comes, for example, from a system’s password file then one might be able to “crack” it by a combination of brute-forcing and a dictionary attack. This is not just theoretically possible, but, a couple of years ago, I actually did this on a password file, with considerable success.

    However for all I know, this MD5 hash comes from hashing a 200MB PDF file. Or whatever.

    You cannot decode an MD5 hash. Period. Only if you know that the item that has been hashed is relatively small have you any chance of engineering a reverse.