MD5, the cryptographic hash function that’s often used to verify that files have not been tampered with, has been broken for a couple of years now. A lot of times when you hear about some algorithm being compromised, it’s not something that’s immediately practical to exploit… an encryption algorithm’s effective strength is reduced by a bit or two, or maybe a hash function has been compromised such that a huge amount of computational effort can make a completely bargled file that has an identical checksum to a known source. Not so in the case of MD5, as Peter Selinger describes:
It is now well-known that the crytographic hash function MD5 has been broken. In March 2005, Xiaoyun Wang and Hongbo Yu of Shandong University in China published an article in which they describe an algorithm that can find two different sequences of 128 bytes with the same MD5 hash.
As we will explain below, the algorithm of Wang and Yu can be used to create files of arbitrary length that have identical MD5 hashes, and that differ only in 128 bytes somewhere in the middle of the file.
Selinger’s example exploit will allow you to produce two working executable files with different behaviors, but matching checksums. Presumably, one would be a file with the intended behavior, and the other an “evil” version that could be slipped in as a replacement without anyone knowing. Pretty interesting stuff.
Collisions in the MD5 cryptographic hash function – Link