The paper describes an attack that would allow an attacker to massage (“herd”) an object to a point where it matches a hash value chosen by the attacker prior. What appears to be an important restriction is that the hash value has to be defined by the attacker prior to attack. This is important because in most uses of hash algorithms the victim would be the one defining the hash, not the attacker. Hence, this attack will not help you construct a message that matches a password hash.
The steps required for the attack are:
- Attacker runs a collision finding attack on a hash algorithm and creates an array of intermediate hash states. In the paper this array is referred to as the “diamond structure”. It is not clear to me how the size of this structure is determined but half of the intermediate hash states in this structure can be used in creating message blocks (next step) to be imposed on the victim object in order to allow the attacker to edit that object and still produce the same hash as the original.
- After having the diamond structure made the attacker then runs an exhaustive search for a string which collides with one of the intermediate hash states in the structure. Once found the attacker can “construct a sequence of message blocks” in order to build the proper suffix which will be added to the original object and the attackers edited version.
Questions I have concerning the above process are:
- After finding a collision how does one build the intermediate hash states. For this I will need to read up more on current collision finding methods: Multicollisions in iterated has functions by Antoine Joux (need to find), Formal aspects of mobile code security by Richard Drews Dean (attached). Learning more about hash states in a few hash algorithms should also help.
- How is the size of the diamond structure determined?
- What is the relation of the message blocks (used to create the final suffix added to the two different objects that collide) and the intermediate hash states?
- Finally, how does the attacker determine which message blocks are to be used with the suffix, and what is the function for creating the suffix.
Perhaps some of the above questions can be answered with another read, if I can find the time. Also would like to find is “How to Swindle Rabin” by Gideon Yuval.
One example application mentioned is abusing trust in a manner similar to social engineering. A malicious programmer writing a piece of code for a project which manages the code trust based on hash values. The attacker first runs a computation for building a diamond like structure/list of hash values that are optimum for collision. They then write some legitimate unsuspecting code which hashes to one of the chosen values. An auditor reviews the code and enters it into the code repository. The attacker can now edit that code and add a small back door.
All in all this paper reminds me in some way of Dan Kaminsky’s exploit of the MD5 collision examples which he describes in his paper MD5 To Be Considered Harmful Someday (attached). He constructed files that included the example collision messages within and continued to produce MD5 collisions. The difference with the Hash herding described here is that the message used can look coherent and unsuspecting. The method differs from Dan’s in that hash herding uses the internal messages produced at different stages of the hash algorithm to give the attack the flexibility required to have greater control on the message.