Last year I made a bunch of posts about cryptography on a non-tech forum so that people would stop posting threads about how they didn't trust their browser enough to send a company their credit card information over an encrypted connection and so they were going to just email their credit card information to the company instead. I doubtless made some of these posts while drunk so feel free to correct them. I'm not a crypto expert by any means - I worked for an obscure branch of the air force research laboratory, not the NSA, but we needed somebody who was conversant on crypto and I volunteered. Any references to Gino/Ginormous, MilHip/Military Hippie, or Warto is a reference to a former co-worker (Ginormous was ginormous, Military Hippie was a hippie who worked for the military, and Warto had a wart).
Private / Secret / Symmetric Key Cryptography
This is the type of cryptography where you encrypt data using a secret key, and then decrypt the data using the same secret key (hence "symmetric key"). These generally break down into two classes of ciphers, stream ciphers (encrypting a bit at a time) and block ciphers (encrypting a block at a time, where a block is usually 64 or 128 bits). You can construct a stream cipher out of a block cipher if you use something like Counter Mode, and block ciphers receive more attention (this is a good thing, more on this later), so forget about stream ciphers.
The two block ciphers you're most likely to encounter are DES (Data Encryption Standard, specifically you'd encounter Triple-DES which is three DES passes over the same data because vanilla DES has a key that's too small and is susceptible to brute-force attack) and AES (Advanced Encryption Standard). DES is old, was developed more-or-less in secret, and is relatively slow. AES is new, was developed in an open competition between a crapton of cryptographers, and is relatively fast; it's also approved by the NSA for encrypting secret data (using 128-bit keys) and top secret data (using 256-bit keys). The NSA would not approve AES if it was aware of a viable attack against it. If you know of an attack against a cipher, then it's only a matter of time before the Bad Guys discover the attack (assuming they have not already done so). The NSA is not going to open up the rest of the government to attack just so they can decrypt your porn folder.
Block Cipher Modes
A block cipher only encrypts a single block of data (in the case of AES this is a 128-bit block), so if you want to encrypt more data you need to use a block cipher mode. The simplest mode is to use an Electronic Codebook - just encrypt each 128-bit block using the key and concatenate them all. This is a horrible idea because it leaks a ****ton of information to an attacker, namely the attacker will know if two or more blocks are repeated. There are better alternatives, one of them being Counter Mode: take a nonce ("number used once", a one-time secret key), concatenate it to a counter (any non-repeating stream of numbers, but a non-negative integer that increments at each step works fine) to generate a huge-ass block of crap, encrypt said crap and xor it to your data, and voila.
Public Key Exchange
Oh boy here comes math. The problem with secret key cryptography is that if I'm trying to send some secret **** to Gino then we need to have the same key, but how the hell do we agree on a key when Military Hippie is spying on us? Public / Asymmetric Key Cryptography to the rescue! Diffie-Hellman is the easiest one for me to understand, so that'll be the example I'm going to use.
Gino and I agree (in the open, where anybody can eavesdrop) on a large prime number p and a logarithm root g. I choose an integer a and Gino chooses an integer b. I send Gino (g^a mod p), she sends me (g^b mod p), I take ((g^b mod p)^a mod p) while she takes ((g^a mod p)^b mod p), and thanks to some mathematical wankery we wind up with the same number (our secret key), meanwhile nobody who's been eavesdropping can figure out what the hell just happened. Yay math! I'd go into more detail but to hell with math.
Attack!!!
The problem with vanilla Diffie-Hellman is that it's subject to a man-in-the-middle attack. I'll skip the math notation and sum it up as: Military Hippie intercepts all communications between Gino and I, and performs a key exchange with each of us. Now all of my data securely goes to Military Hippie, who decrypts and reads it, then encrypts it (after modifying it if that's what floats his boat) and passes it along to Gino. As far as Gino and I know we're performing a secure data exchange, but meanwhile Milhip is reading our mail.
Digital Signatures
A common solution is to use digital signatures to ensure that man-in-the-middle attacks fail. There's more math involved here but **** those tricks, the bottom line is that before performing a key exchange I verify the other party's identity by confirming their digital signature. How do I know what somebody's digital signature is? Some Trusted Authority tells me. That's why when you're doing online commerce you'll see that little lockbox at the bottom of your browser; if you click on said lockbox then it'll say something like "verified by Verisign" or something, meaning that you trust Verisign to properly identify the entity you're doing your online banking with or purchasing porn from or whatever.
Hashing
A hash is a fixed-length obfuscated summary of your data. F'rinstance, if you xor all of the blocks generated by a block cipher, then you're going to produce a block-sized summary of whatever it is you've encrypted - then if somebody mucks around with your data (inserting false blocks or whatever) then the guy on the other end will detect the tampering by comparing his hash with your hash. Note that it's important that the adversary not be allowed to muck around with the hash, otherwise he can just muck with the data, hash it, and replace your hash with his hash. Hence a hash function needs to incorporate a key, so that to muck with your hash the adversary needs to figure out what hash function you're using (easy) and figure out what key you're using (extremely difficult). Note to computer scientists: cryptographic hashing is much more secure and much more expensive than the hashes used in hash tables. In a hash table it doesn't particularly matter if somebody is able to reverse the hash, but the whole point of a cryptographic hash is to prevent anybody from reversing it.
Hash functions come in a variety of flavors intended for different purposes:
Key Storage. In older operating systems, passwords were stored in plaintext files; you'd type in your password and the OS would do a simple string comparison with the password stored under your username. Only the admin/root had access to this file (unless a system vulnerability gave somebody else access to it), but this scheme thus meant that a) the admin had more power than was often appropriate (the chairman of the board probably doesn't want somebody in IT to be able to read all of his email) and b) an attacker could crack a system by removing the hard drive.
The solution is to use a powerful hash function such as SHA-256 or SHA-512 (which produce 256/512-bit hashes respectively) on passwords. When you type in your password, the OS hashes it and compares it to the stored hash. This has the added benefit of reducing the impact of timing attacks (older operating systems doing a string comparison of plaintext passwords would reject the password after finding a mismatched character, so an attacker could deduce the password by timing how long it took the OS to reject it) because even if the hashed password were rejected after a single mismatched machine word this would not reflect on how many characters of the unhashed password were correct.
Message Authentication. This uses a "cheaper" hash function like HMAC, because while with password hashing we want it to take several years/centuries for the attacker to reverse the hash, with message authentication we can usually afford to have hash reversal take only a couple of days. Let's say that I'm doing some online banking, and send the bank some encrypted data on a transaction I'd like to perform. I hash this data so that the bank will know that nobody is mucking with our transaction. Somebody intercepts this message and three days later has reversed the hash, meaning that they can do things like attempt a replay attack - except that now the data are stale, and in all likelihood the bank and I have been encrypting timestamps in with our data. Internet transactions have to be able to tolerate a few minutes' of skew in timestamps as messages are lost and re-sent, but a few days' anomaly in the timestamps are cause for rejecting the message.
Salting
This is a technique for preventing birthday attacks. The Birthday Paradox is that people expect that they'd need to have about 183 people (365/2) in a room together for there to be a 50% chance that two of them have the same birthday, but the function is not linear and in fact for there to be a 50% chance of a "collision" (two people with the same birthday) you only need about 19 people (square-root of 365). In the hashing section I mention SHA-256 and SHA-512, and in the AES section I mention AES-128 and AES-256 -- the hash keys are twice as long as the encryption keys to get the same level of security, because we've got to take the square root of the hash key length to account for the birthday paradox.
What this means for those of us who aren't designing hash functions is that we need to salt our password files. Let's say I'm an administrator for a database that has millions or billions of users (LexisNexus or something). Everybody's got a password hashed with SHA-512, so the system is pretty safe - it's difficult to find a collision with SHA-512. But, remember that passwords usually don't have much entropy, and add to that the fact that there are billions of passwords hashed in my "passwords.dat" file. An attacker only needs one password to do damage, so if he gets his hands on my passwords.dat file he can just start hashing low-entropy passwords and checking each test against all of the passwords - the birthday paradox kicks in, and the odds of the attacker cracking a password go way up. So we append a unique random string of, say, 512 bits to all of our hashed passwords, hash the 1024-bit string with SHA-512, and then store the new hash and the random 512-bit string. Now the attacker can only attack passwords one at a time - he can't hash a password and compare it to a billion other hashed passwords, instead he must hash a password, re-hash the password appended with the password's unique salt, and then compare the hash to the single password using that salt.
Private / Secret / Symmetric Key Cryptography
This is the type of cryptography where you encrypt data using a secret key, and then decrypt the data using the same secret key (hence "symmetric key"). These generally break down into two classes of ciphers, stream ciphers (encrypting a bit at a time) and block ciphers (encrypting a block at a time, where a block is usually 64 or 128 bits). You can construct a stream cipher out of a block cipher if you use something like Counter Mode, and block ciphers receive more attention (this is a good thing, more on this later), so forget about stream ciphers.
The two block ciphers you're most likely to encounter are DES (Data Encryption Standard, specifically you'd encounter Triple-DES which is three DES passes over the same data because vanilla DES has a key that's too small and is susceptible to brute-force attack) and AES (Advanced Encryption Standard). DES is old, was developed more-or-less in secret, and is relatively slow. AES is new, was developed in an open competition between a crapton of cryptographers, and is relatively fast; it's also approved by the NSA for encrypting secret data (using 128-bit keys) and top secret data (using 256-bit keys). The NSA would not approve AES if it was aware of a viable attack against it. If you know of an attack against a cipher, then it's only a matter of time before the Bad Guys discover the attack (assuming they have not already done so). The NSA is not going to open up the rest of the government to attack just so they can decrypt your porn folder.
Block Cipher Modes
A block cipher only encrypts a single block of data (in the case of AES this is a 128-bit block), so if you want to encrypt more data you need to use a block cipher mode. The simplest mode is to use an Electronic Codebook - just encrypt each 128-bit block using the key and concatenate them all. This is a horrible idea because it leaks a ****ton of information to an attacker, namely the attacker will know if two or more blocks are repeated. There are better alternatives, one of them being Counter Mode: take a nonce ("number used once", a one-time secret key), concatenate it to a counter (any non-repeating stream of numbers, but a non-negative integer that increments at each step works fine) to generate a huge-ass block of crap, encrypt said crap and xor it to your data, and voila.
Public Key Exchange
Oh boy here comes math. The problem with secret key cryptography is that if I'm trying to send some secret **** to Gino then we need to have the same key, but how the hell do we agree on a key when Military Hippie is spying on us? Public / Asymmetric Key Cryptography to the rescue! Diffie-Hellman is the easiest one for me to understand, so that'll be the example I'm going to use.
Gino and I agree (in the open, where anybody can eavesdrop) on a large prime number p and a logarithm root g. I choose an integer a and Gino chooses an integer b. I send Gino (g^a mod p), she sends me (g^b mod p), I take ((g^b mod p)^a mod p) while she takes ((g^a mod p)^b mod p), and thanks to some mathematical wankery we wind up with the same number (our secret key), meanwhile nobody who's been eavesdropping can figure out what the hell just happened. Yay math! I'd go into more detail but to hell with math.
Attack!!!
The problem with vanilla Diffie-Hellman is that it's subject to a man-in-the-middle attack. I'll skip the math notation and sum it up as: Military Hippie intercepts all communications between Gino and I, and performs a key exchange with each of us. Now all of my data securely goes to Military Hippie, who decrypts and reads it, then encrypts it (after modifying it if that's what floats his boat) and passes it along to Gino. As far as Gino and I know we're performing a secure data exchange, but meanwhile Milhip is reading our mail.
Digital Signatures
A common solution is to use digital signatures to ensure that man-in-the-middle attacks fail. There's more math involved here but **** those tricks, the bottom line is that before performing a key exchange I verify the other party's identity by confirming their digital signature. How do I know what somebody's digital signature is? Some Trusted Authority tells me. That's why when you're doing online commerce you'll see that little lockbox at the bottom of your browser; if you click on said lockbox then it'll say something like "verified by Verisign" or something, meaning that you trust Verisign to properly identify the entity you're doing your online banking with or purchasing porn from or whatever.
Hashing
A hash is a fixed-length obfuscated summary of your data. F'rinstance, if you xor all of the blocks generated by a block cipher, then you're going to produce a block-sized summary of whatever it is you've encrypted - then if somebody mucks around with your data (inserting false blocks or whatever) then the guy on the other end will detect the tampering by comparing his hash with your hash. Note that it's important that the adversary not be allowed to muck around with the hash, otherwise he can just muck with the data, hash it, and replace your hash with his hash. Hence a hash function needs to incorporate a key, so that to muck with your hash the adversary needs to figure out what hash function you're using (easy) and figure out what key you're using (extremely difficult). Note to computer scientists: cryptographic hashing is much more secure and much more expensive than the hashes used in hash tables. In a hash table it doesn't particularly matter if somebody is able to reverse the hash, but the whole point of a cryptographic hash is to prevent anybody from reversing it.
Hash functions come in a variety of flavors intended for different purposes:
Key Storage. In older operating systems, passwords were stored in plaintext files; you'd type in your password and the OS would do a simple string comparison with the password stored under your username. Only the admin/root had access to this file (unless a system vulnerability gave somebody else access to it), but this scheme thus meant that a) the admin had more power than was often appropriate (the chairman of the board probably doesn't want somebody in IT to be able to read all of his email) and b) an attacker could crack a system by removing the hard drive.
The solution is to use a powerful hash function such as SHA-256 or SHA-512 (which produce 256/512-bit hashes respectively) on passwords. When you type in your password, the OS hashes it and compares it to the stored hash. This has the added benefit of reducing the impact of timing attacks (older operating systems doing a string comparison of plaintext passwords would reject the password after finding a mismatched character, so an attacker could deduce the password by timing how long it took the OS to reject it) because even if the hashed password were rejected after a single mismatched machine word this would not reflect on how many characters of the unhashed password were correct.
Message Authentication. This uses a "cheaper" hash function like HMAC, because while with password hashing we want it to take several years/centuries for the attacker to reverse the hash, with message authentication we can usually afford to have hash reversal take only a couple of days. Let's say that I'm doing some online banking, and send the bank some encrypted data on a transaction I'd like to perform. I hash this data so that the bank will know that nobody is mucking with our transaction. Somebody intercepts this message and three days later has reversed the hash, meaning that they can do things like attempt a replay attack - except that now the data are stale, and in all likelihood the bank and I have been encrypting timestamps in with our data. Internet transactions have to be able to tolerate a few minutes' of skew in timestamps as messages are lost and re-sent, but a few days' anomaly in the timestamps are cause for rejecting the message.
Salting
This is a technique for preventing birthday attacks. The Birthday Paradox is that people expect that they'd need to have about 183 people (365/2) in a room together for there to be a 50% chance that two of them have the same birthday, but the function is not linear and in fact for there to be a 50% chance of a "collision" (two people with the same birthday) you only need about 19 people (square-root of 365). In the hashing section I mention SHA-256 and SHA-512, and in the AES section I mention AES-128 and AES-256 -- the hash keys are twice as long as the encryption keys to get the same level of security, because we've got to take the square root of the hash key length to account for the birthday paradox.
What this means for those of us who aren't designing hash functions is that we need to salt our password files. Let's say I'm an administrator for a database that has millions or billions of users (LexisNexus or something). Everybody's got a password hashed with SHA-512, so the system is pretty safe - it's difficult to find a collision with SHA-512. But, remember that passwords usually don't have much entropy, and add to that the fact that there are billions of passwords hashed in my "passwords.dat" file. An attacker only needs one password to do damage, so if he gets his hands on my passwords.dat file he can just start hashing low-entropy passwords and checking each test against all of the passwords - the birthday paradox kicks in, and the odds of the attacker cracking a password go way up. So we append a unique random string of, say, 512 bits to all of our hashed passwords, hash the 1024-bit string with SHA-512, and then store the new hash and the random 512-bit string. Now the attacker can only attack passwords one at a time - he can't hash a password and compare it to a billion other hashed passwords, instead he must hash a password, re-hash the password appended with the password's unique salt, and then compare the hash to the single password using that salt.
Comment