The BIG big endian patch
May 4, 2016
This text comes from radare2’s doc/endian
. And aims to explain the reason why radare2 codebase was handling endianness in a problematic way. damo22 cooked a huge patch addressing those issues and making the code endian-independent. This means that at compile time the code doesn’t assumes any local specific endian.
Endian issues
As hackers, we need to be aware of endianness.
Endianness can become a problem when you try to process buffers or streams of bytes and store intermediate values as integers with width larger than a single byte.
It can seem very easy to write the following code:
ut8 opcode[4] = {0x10, 0x20, 0x30, 0x40};
ut32 value = *(ut32*)opcode;
... and then continue to use "value" in the code to represent the opcode.
This needs to be avoided!
Why? What is actually happening?
When you cast the opcode stream to a unsigned int
, the compiler uses the endianness
of the host to interpret the bytes and stores it in host endianness. This leads to
very unportable code, because if you compile on a different endian machine, the
value stored in “value” might be 0x40302010
instead of 0x10203040
.
In the past, radare devs were not as strict about this issue, and as a result, needed to swap the endian of values regularly in the code.
Solution
Use bitshifts and OR instructions to interpret bytes in a known endian. Instead of casting streams of bytes to larger width integers, do the following:
ut8 opcode[4] = {0x10, 0x20, 0x30, 0x40};
ut32 value = opcode[0] | opcode[1] << 8 | opcode[2] << 16 | opcode[3] << 24;
or if you prefer the other endian:
ut32 value = opcode[3] | opcode[2] << 8 | opcode[1] << 16 | opcode[0] << 24;
This is much better because you actually know which endian your bytes are stored in within the integer value, REGARDLESS of the host endian of the machine.
Endian helper functions
Radare2 now uses helper functions to interpret all byte streams in a known endian.
Please use these at all times, eg:
// read 4 bytes from a stream in BE
val32 = r_read_be32(buffer)
// read 4 bytes from a stream in LE
val32 = r_read_le32(buffer)
// read 4 bytes from a stream:
// if isbig is true, reads in BE
// otherwise reads in LE
val32 = r_read_ble32(buffer, isbig)
There are a number of helper functions for 64
, 32
, 16
, and 8
bit reads and writes.
(Note that 8 bit reads are equivalent to casting a single byte of the buffer to
a ut8
value, ie endian is irrelevant).
Happy hacking!
- damo22