Emulating a simple bootloader

August 16, 2016

Introduction

Generally speaking, emulating a bootloader is simpler than it is for regular binaries, because they lack external libraries and usually have direct access to memory and hardware.

In this case, the bootloader is a binary for x86 architecture which runs in 16-bits real mode using BIOS calls to perform its loading duties and textual input/output.

The idea here is to emulate Cropta1 crackme using radare2 ESIL emulation, providing the needed BIOS via a trivial quick & dirty python implementation of just what it’s needed to run the crackme code.

There are several ways to do it, I tried two of them and here is the story.

Take one, use r2pipe

Whenever i use r2pipe i feel home, moreover there’s an example (in nodejs) of a similar case - the emulation of syscalls - that’s why it’s the first thing i tried.

My bios looked like this:

import r2pipe, sys, os

r2 = r2pipe.open('#!pipe')

# just the hdd params stolen from bochsrc
cylinders=20
heads=16
spt=63
bps=512

# function to read a key from stdin
def wait_key():
	result = None
	if os.name == 'nt':
		import msvcrt
		result = msvcrt.getch()
	else:
		import termios
		fd = sys.stdin.fileno()
		oldterm = termios.tcgetattr(fd)
		newattr = termios.tcgetattr(fd)
		newattr[3] = newattr[3] & ~termios.ICANON & ~termios.ECHO
		termios.tcsetattr(fd, termios.TCSANOW, newattr)
		try:
			result = sys.stdin.read(1)
		except:
			pass
		finally:
			termios.tcsetattr(fd, termios.TCSAFLUSH, oldterm)
		return result


# this handles the interrupts
def handle_intr(intNum):
	regs = r2.cmdj('arj')

	# helper funcs to read/write hi and low parts of regs
	def xh(regName, setValue = None):
		val = regs[regName]
		if setValue == None:
			return (val & 0xff00)>>8
		else:
			val = (val & 0xff) | ((setValue & 0xff) << 8)
			r2.cmd('ar ' + regName + '=' + hex(val))
			return val

	def xl(regName, setValue = None):
		val = regs[regName]
		if setValue == None:
			return val & 0xff
		else:
			val = (val & 0xff00) | (setValue & 0xff)
			r2.cmd('ar ' + regName + '=' + hex(val))
			return val

	# command is in ah
	command = xh('ax') 

	# read/write disk
	if intNum == 0x13: 
		# read from disk to memory
		if command == 2:
			# al, number of sectors to read
			nSectors = xl('ax')
			
			# ch, cylinder
			cylinder = xh('cx')
			
			# cl, sector
			firstSector = xl('cx')
			
			# dh, head
			head = xh('dx') 
			
			# bx, buffer in memory
			destination = regs['bx'] & 0xffff

			# hdd math
			source = (firstSector - 1 + (head + cylinder * heads) * spt ) * bps
			length = nSectors * bps

			# do the actual writing in r2
			r2.cmd('e io.cache=true')
			r2.cmd('wd ' + hex(source) + ' ' + hex(length) + ' @ ' + hex(destination))
			
			# success -> carry flag = 0
			r2.cmd('ar cf=0')
		
		# geometry query
		elif command == 8: 
			# dl drive number
			driveNum = xl('dx')
			
			if driveNum != 0x80:
				# if not first drive, error -> carry flag = 1
				r2.cmd('ar cf=1')
			else:
				# success, return geometry
				r2.cmd('ar cf=0')
				r2.cmd('ar ax=0')
				r2.cmd('ar dx=' + hex(((heads-1) << 8) | 1))
				r2.cmd('ar cx=' + hex(spt | (cylinders << 8)))

	# keyboard i/o
	elif intNum == 0x16: 
		# read extended key
		if command == 0x10:
			result = ord(wait_key())
			high = 0
			if result == 10:
				high = 0x1c
			elif result == 127:
				high = 0xe
			r2.cmd('ar ax=' + hex(result | (high << 8)))

	# screen output
	elif intNum == 0x10:
		# print char
		if command == 0xe:
			char = chr(xl('ax'))
			sys.stdout.write(char)
			sys.stdout.flush()

# call it, with parameter coming from r2
handle_intr(int(sys.argv[1], 0))

The above code is far from being a complete BIOS implementation, or even to be a correct subset: it’s just what the crackme uses in its interesting part - the initial.

Running this in radare2 is as easy as doing:

$ r2 -b 16 HardDisk
 -- Choose your architecture by typing: 'e asm.arch=<arch>'
[0000:0000]> aei
[0000:0000]> aeim 0x2000 0xffff
[0000:0000]> aeip
[0000:0000]> e io.cache=true
[0000:0000]> "e cmd.esil.intr=#!pipe python bios_pipe.py"
[0000:0000]> e esil.gotolimit=0xffff
[0000:0000]> ! (sleep 30 && killall -3 r2)&
[0000:0000]> aec

The following paragraph (Emulation setup) is an explosion of the above r2 commands with a lengthy explanation of each, feel free to skip it if the above r2 passage is obvious to you.

Emulation setup

[0000:0000]> aei

aei initializes the ESIL VM state (as stated in ae? help) which means if there was a previous ESIL context is destroyed here and a new ESIL stack gets deployed.

[0000:0000]> aeim 0x2000 0xffff

aeim allocates the memory for mem read / write operations, basically needed for the stack pointer to point somewhere harmless.

Here i’m placing the start of it at address 0x2000 with a length of 0xffff bytes. The value for the start value is exactly the size of the binary, so that memory writes will likely not overwrite the code.

At the beginning of the bootloader code the stack pointer is placed at address 0x7c00, so it can grow for 23552 bytes before potentially overlapping to the code. It may or may not be enough, hopefully it is for this simple case.

In more complex cases of boot loader, maybe it’s necessary to keep the memory in one file descriptor and the code in another. This is possible for example by using temporary file descriptor seeks in r2 read / write commands.

[0000:0000]> aeip

This will set the ESIL instruction pointer (and the IP alias register of the current architecture, as specified in the register profile of the anal plugin) to the current seek, namely 0.

[0000:0000]> e io.cache=true

This let us write in the current session’s memory without having r2 to write it back to the binary file.

[0000:0000]> "e cmd.esil.intr=#!pipe python bios_pipe.py"

This, in pseudo english, means: “Every time there’s an ESIL interrupt ($ instruction), spawn this python script and pass it the number of the interrupt as argument”. This will load and execute the bios depicted above.

[0000:0000]> e esil.gotolimit=0xffff

This one. It took me a couple of hours to figure out what that ESIL infinite loop detected error message did mean.

The failing instruction was: rep movsb byte es:[di], byte ptr [si] which is known to be bounded by the cx value, which was itself conveniently set to the very finite value of 0x1e5 just few bytes above… so?

It turns out that the gotolimit is the maximum allowed count of single ESIL instructions which can be executed in a statement - and that’s great. In this case, the esil statement for the above failing instruction is:

cx,!,?{,BREAK,},si,[1],di,=[1],df,?{,1,si,-=,1,di,-=,},df,!,?{,1,si,+=,1,di,+=,},cx,--=,cx,?{,5,GOTO,}

which is composed by 35 esil instructions, so doing the rough math ?v 35*0x1e5 = 0x424f which is clearly greater than the default esil.gotolimit = 0x00001000 even if we ignore the fact that the GOTO jumps to instruction 5 and not to the beginning of the statement.

[0000:0000]> ! (sleep 30 && killall -3 r2)&

At the end of the emulated code, the bootloader code enters an infinite loop. This is a dirty trick to schedule r2 quit at 30 seconds from now whatever happens (included that you may have closed an r2 session and opened another one in the meantime…).

This particular one needs a posix shell to work.

[0000:0000]> aec

Starts the emulation until CTRL+C is pressed, if you have a chance to, if CTRL+C is honored by both radare2 and the spawned python code which may be running continuously at that time. Basically, in this case, it means run the emulation forever (due to the final infinite loop) or until r2 is killed by the dirty trick above.

Cinema of take one

asciicast

This demonstration shows all the above actually works, but - unless you’re shooting an 1980s sci-fi B-movie - it’s spectacularly slow for every real world use case.

Take two, using r2lang + python RCore plugin

At that point, also after talking to pancake about this, there could be several reasons for it to be so slow, sorted by probability (more probable first):

  • spawning python intepreter at each interrupt is slow
  • my shitty python code is slow
  • python in general is slow
  • ESIL emulation is slow

Starting to address the more probable issue, an alternative way to do this - while still using my python BIOS - is to define an RCore plugin which accepts a new ‘bios’ command. In this way the python code is loaded only once and then at each interrupt the command itself gets executed, reusing the same python context.

Here is the modifications to the python code above:

1 - instead of importing r2pipe, let’s import r2lang:

import r2lang, sys, os, json
r2 = r2lang

2 - replace missing cmdj with native python json.loads in bios code:

# this handles the interrupts
def handle_intr(intNum):
	regs = json.loads(r2.cmd('arj'))
	...

3 - register the core plugin:

def bioscore(a):
	def _call(s):
		if s == "bios":
			ip = int(r2.cmd("ar ip"),0) - 2
			num = int(r2.cmd("?v $v@" + hex(ip)),0)
			handle_intr(num)
			return 1
		return 0

	return {
		"name" : "BiosCore",
		"license": "WTFPL",
		"desc": "toy bios",
		"call": _call
	}


r2lang.plugin("core", bioscore)

The most evident issue so far is that the custom commands defined in RCore plugins don’t accept parameters, therefore here is another dirty trick.

In order to get the numeric value of the interrupt, i decided to use the $v variable which returns the immediate value of the instruction at the current seek. The problem here is that during emulation, the instruction pointer has been already incremented by the time the interrupt gets executed. So, assuming that x86 16-bit encoding of INT XX instructions is always 2 bytes long, i just subtracted 2 to current ip value in order to get the seek for the immediate value to extract.

And again, the execution sequence:

$ r2 -i bios.py -b 16 HardDisk
 -- You are probably using an old version of r2, go checkout the git!
[0000:0000]> aei
[0000:0000]> aeim 0x2000 0xffff
[0000:0000]> aeip
[0000:0000]> e io.cache=true
[0000:0000]> (orpo, bios)
[0000:0000]> "e cmd.esil.intr=` `;.(orpo)"
[0000:0000]> e esil.gotolimit=0xffff
[0000:0000]> ! (sleep 30 && killall -3 r2)&
[0000:0000]> aec

Emulation setup (differences)

[0000:0000]> (orpo, bios)

Again, custom commands do not accept parameters. More: if they’re called with a parameter, they don’t get executed at all.

To overcome this limitation, i just defined a macro named orpo. In this way, the extra unusable parameter which r2 pass to the intr handler is just ignored and the custom command is called.

[0000:0000]> "e cmd.esil.intr=` `;.(orpo)"

Here is the modified intr handler which in turn calls our macro, which in turn calls our custom command.

Buried in the above command line there’s also another mystery i lost an interesting hour to workaround. That space in backticks. By removing that, each char which is output from my BIOS command gets prepended by what it seems the output of a printf("0x%x\n", somevalue); buried somewhere along the r2 code path around the interrupt handling / r2lang io piping (i guess, but actually i was unable to find it).

Cinema of take two

asciicast

Hey! This time is faster. Still there’s space for improvements, i guess it’s possible to go down the list of slowdown probability unrolled above until rewriting the BIOS in C, for example. Honestly enough, though, the python RCore plugin seems pretty fast to me.

(by mrmacete)