The BIG big endian patch

May 4, 2016

This text comes from radare2’s doc/endian. And aims to explain the reason why radare2 codebase was handling endianness in a problematic way. damo22 cooked a huge patch addressing those issues and making the code endian-independent. This means that at compile time the code doesn’t assumes any local specific endian.

Endian issues

As hackers, we need to be aware of endianness.

Endianness can become a problem when you try to process buffers or streams of bytes and store intermediate values as integers with width larger than a single byte.

It can seem very easy to write the following code:

ut8 opcode[4] = {0x10, 0x20, 0x30, 0x40};
ut32 value = *(ut32*)opcode;

... and then continue to use "value" in the code to represent the opcode.

This needs to be avoided!

Why? What is actually happening?

When you cast the opcode stream to a unsigned int, the compiler uses the endianness of the host to interpret the bytes and stores it in host endianness. This leads to very unportable code, because if you compile on a different endian machine, the value stored in “value” might be 0x40302010 instead of 0x10203040.

In the past, radare devs were not as strict about this issue, and as a result, needed to swap the endian of values regularly in the code.

Solution

Use bitshifts and OR instructions to interpret bytes in a known endian. Instead of casting streams of bytes to larger width integers, do the following:

ut8 opcode[4] = {0x10, 0x20, 0x30, 0x40};
ut32 value = opcode[0] | opcode[1] << 8 | opcode[2] << 16 | opcode[3] << 24;

or if you prefer the other endian:

ut32 value = opcode[3] | opcode[2] << 8 | opcode[1] << 16 | opcode[0] << 24;

This is much better because you actually know which endian your bytes are stored in within the integer value, REGARDLESS of the host endian of the machine.

Endian helper functions

Radare2 now uses helper functions to interpret all byte streams in a known endian.

Please use these at all times, eg:

// read 4 bytes from a stream in BE
val32 = r_read_be32(buffer)
// read 4 bytes from a stream in LE
val32 = r_read_le32(buffer)
// read 4 bytes from a stream:
//   if isbig is true, reads in BE
//   otherwise reads in LE
val32 = r_read_ble32(buffer, isbig)

There are a number of helper functions for 64, 32, 16, and 8 bit reads and writes.

(Note that 8 bit reads are equivalent to casting a single byte of the buffer to a ut8 value, ie endian is irrelevant).

Happy hacking!

  • damo22