Binary diffing

August 21, 2014

Yesterday, a new feature was pushed to radare2: offset-based function diffing. We’d like to take this opportunity to write a bit about radare2’s diffing features before showing the shiny new one.

Let’s take a copy of a cracked crackme as an example, and the true and false binaries.

Without parameter, radiff2 will by default show what bytes changed, and the corresponding offsets.

$ radiff2 genuine cracked
0x000081e0 85c00f94c0 => 9090909090 0x000081e0
0x0007c805 85c00f84c0 => 9090909090 0x0007c805

$ rasm2 -d 85c00f94c0
test eax, eax
sete al

Notice how the two jumps are noped.

For bulk processing, you may want to have a higher-overview of the differences. This is why radare2 is able to compute the distance and the percentage of similarity between two files with the -s option:

$ radiff2 -s /bin/true /bin/false
similarity: 0.97
distance: 743

If you want more concrete data, it’s also possible to count the differences, with the -c option:

$ radiff2 -c genuine cracked

If you’re unsure about the fact that you’re dealing with similar binaries, you can check if some functions are matching with the -C option. The columns being: “First file offset”, “Percentage of matching” and “Second file offset”.

$ radiff2 -C /bin/false /bin/true 
                   entry0  0x4013e8 |   MATCH  (0.904762) | 0x4013e2  entry0
sym.imp.__libc_start_main  0x401190 |   MATCH  (1.000000) | 0x401190  sym.imp.__libc_start_main
             fcn.00401196  0x401196 |   MATCH  (1.000000) | 0x401196  fcn.00401196
             fcn.0040103c  0x40103c |   MATCH  (1.000000) | 0x40103c  fcn.0040103c
             fcn.00401046  0x401046 |   MATCH  (1.000000) | 0x401046  fcn.00401046

And now the cool feature : radare2 supports graph-diffing, à la DarunGrim, with the -g option. You can either give a symbol name, of specify two offsets in case the function you want to diff doesn’t have the same name in both file.

For example, radiff2 -g main /bin/true /bin/false | xdot - will show the differences between the main function of true and false. You can compare it to radiff2 -g main /bin/false /bin/true (Notice the order of the arguments) to get the two versions.

This is the result: /bin/true and /bin/false graph diff

The parts in yellow are indicating that some offsets are not matching, the grey one is a perfect match, while the red one highlight a strong difference. If you look closely, you’ll see that the left one is mov edi, 0x1; call sym.imp.exit, while the right one is xor edi, edi; call sym.imp.exit.

Binary diffing is an important feature for reverse engineering. It can be used to analyze security updates, infected binaries, firmware changes and more..

We have only shown the code analysis diffing functionality, but radare2 supports more sort of diffing between two binaries at byte level, deltified similarities and more to come.

We have plans to implement more kinds of bindiffing functionalities into r2, and why not, add support for ascii art graph diffing and better integration with the rest of the toolkit.