It appears that the usefulness of MD5 checksums is finally beginning to wane. This security analysis by Dan Kaminsky gives some frightening details of how to create two files that perform different tasks, but have the same MD5 checksum. Bad news. Now this attack relies on the possession of two specially-computed files that have the same checksum, as detailed by Joux and Wang previously:
For MD5 (and actually a number of popular hashing algorithms, SHA-1 not among them), it is possible to compute particular classes of input data for which subtle changes can be silently introduced without causing apparent changes in the final MD5 hash. Capacity is not huge – of the two 128 byte proof-ofconcept files released by Wang, only six bits differ. But many â€doppelganger†sets can be computed, each of which may be swapped out with the other at no effect to the resultant hash. The sets are two MD5 blocks long. Because it’s possible to compute new blocks on demand, a generic â€antivirus†style colliding block detector isn’t possible. It may be possible to generate a custom weak class detector. The ability to generate colliding datasets exposes a fundamentally new mode of operation for MD5.
Now, you don’t have to start worrying about this immediately. Fully-practical attacks are still over the horizon. However, as Dan’s paper shows us, the horizon may be closer than you think. You may want to read up on SHA-1.