I found a really interesting challenge in Cryptography’s weekly quiz. Post it here.

## subject

Let be a Merkle-Damgård hash function built out of compression function F. Suppose that you have a black box which can find pre-images of F ; that is, the black box takes inputs IV and y and outputs an x such that F(IV,x) = y. (You may assume that if you give the black box the same value y multiple times, each time you call it you’ll get back a different value x satisfying F(IV,x) = y ) Show how by using the black box at most 2k times you can find a set of 2^k messages that all have the same hash value when input into the full hash function H.

## inspiration

here is the Merkle–Damgård construction

From the description, “2k—>2^k” has been detected. It seems that I should give linear thing exponential attribute. What first came into my mind is that, there are k positions and in each position, we should have 2 values avaliable. In this way, we will own 2^k different values. So then I try to deal with the problem: How to get 2 different values with same IV and same hash that F(IV, message) generate which give the feasibility that these messages can be connected serially? It seems impossible, but, actully, it can be.

## solution

Something about IV in Merkle–Damgård construction from wikipedia:

In the diagram, the one-way compression function is denoted by f, and transforms two fixed length inputs to an output of the same size as one of the inputs. The algorithm starts with an initial value, the initialization vector (IV).

The IV is a fixed value(algorithm or implementation specific). For each message block, the compression (or compacting) function f takes the result so far, combines it with the message block, and produces an intermediate result. The last block is padded with zeros as needed and bits representing the length of the entire message are appended. (See below for a detailed length padding example.)

I define a structure first.

In this way, from the first fixed IV, we generate 2 different random message M0,1 and M0,1’, and then, we compute F(IV, M0,1) and F(IV, M0,1’). **WE HAVENOT CALL THAT BLACK BOX TILL NOW.** After this operation, we generate a random IV2, and call the black box twice with black(IVt, IV2) and black(IVt’, IV2). In this way, we get M0,2=black(IVt, IV2); M0,2’=black(IVt’, IV2).

Let’s see what happened: we call that black box twice, and we obtain 2 message(m0,1||m0,2 and m0,1||m0,2notation) which can generate **same IV2 with same IV1** in Merkle–Damgård construction. It means that they are **interchangeable**. So now I have solved the problem above: *we do have 2 values avaliable in a single position*. In conclusion, this structure need call black box twice and compute F() tiwce, and we can generate 2 different message with same output and IV in F().

So, what the attacker will do, is run this structure k time, set the first IV with the fixed IV in specific hash algorithm and later structure’s IV1 should be the output, namely, the IV2 in the former structure. In each position in these k structures, we have 2 different avaliable value interchangeably. So the previous challenge have been solved:

there are k positions and in each position, we should have 2 values avaliable. In this way, we will own 2^k different values.