Gsoc 2018 Radeco Pseudo C Code Generation

August 12, 2018

GSoC 2018 Radeco Pseudo C Code Generation

Introduction

This summer, I was working on C-like pseudocode generation with radeco. Althrough radeco was able to analyze executables, it could not decompile analyzed executables. My work allows to use radeco for generating C-like pseudocode from analyzed executables.

Usage

Installation

Note: Nightly Rust is required. You can install it using rustup.

$ rustup install nightly
$ rustup default nightly
$ git clone https://github.com/radareorg/radeco
$ cd radeco
$ cargo install

Decompilation

$ echo '#include<stdio.h>\nint main() {printf("Hello, world.\\n"); return 0;}' | gcc -xc -
$ radeco
>> load a.out
Cannot find function here
[*] Fixing Callee Information
>> fn_list
sym._init
sym.imp.puts
entry0
sym.deregister_tm_clones
sym.register_tm_clones
sym.__do_global_dtors_aux
entry1.init
sym.main
sym.__libc_csu_init
sym.__libc_csu_fini
sym._fini
>> analyze sym.main
[+] Analyzing: sym.main @ 0x1139
  [*] Eliminating Dead Code
  [*] Propagating Constants
  [*] Eliminating More DeadCode
  [*] Eliminating Common SubExpressions
  [*] Verifying SSA's Validity
>> decompile sym.main
fn sym.main () {
    unsigned int tmp;
    *((rsp - 8)) = rbp
    tmp = sym.imp.puts("Hello, world.", rsi, rdx, rcx, r8, r9)
}
>>

Radeco can also connect to existing radare2 process.

Radare2 side

$ r2 a.out
[0x00001040]> aaa
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Constructing a function name for fcn.* and sym.func.* functions (aan)
[x] Type matching analysis for all functions (afta)
[x] Emulate code to find computed references (aae)
[x] Analyze consecutive function (aat)
[0x00001040]> afn write_str @ sym.imp.puts
[0x00001040]> =h& 9090
Background http server started.
WARNING: Background webserver requires http.sandbox=false to run properly
[0x00001040]>
Starting http server...
open http://localhost:9090/
r2 -C http://localhost:9090/cmd/
[0x00001040]>

Radeco side

$ radeco
>> connect http://localhost:9090
[*] Fixing Callee Information
>> decompile sym.main
fn sym.main () {
    unsigned int tmp;
    *((rsp - 8)) = rbp
    tmp = write_str("Hello, world.", rsi, rdx, rcx, r8, r9)
    return
}

You can see radeco uses the information modified by radare2.

Example

Original source code.

void write_strlen(const char *str) {
    const char *iter = str;
    while (*iter) iter++;
    printf("%d\n", iter - str);
}

Disassembly of the compiled code

|   sym.write_strlen (int arg1);
|           ; var int local_18h @ rbp-0x18
|           ; var int local_8h @ rbp-0x8
|           ; CALL XREF from sym.main (0x1189)
|           0x00001139      55             push rbp
|           0x0000113a      4889e5         mov rbp, rsp
|           0x0000113d      4883ec20       sub rsp, 0x20
|           0x00001141      48897de8       mov qword [local_18h], rdi  ; arg1
|           0x00001145      488b45e8       mov rax, qword [local_18h]
|           0x00001149      488945f8       mov qword [local_8h], rax
|       ,=< 0x0000114d      eb05           jmp 0x1154
|       |   ; CODE XREF from sym.write_strlen (0x115d)
|      .--> 0x0000114f      488345f801     add qword [local_8h], 1
|      :|   ; CODE XREF from sym.write_strlen (0x114d)
|      :`-> 0x00001154      488b45f8       mov rax, qword [local_8h]
|      :    0x00001158      0fb600         movzx eax, byte [rax]
|      :    0x0000115b      84c0           test al, al
|      `==< 0x0000115d      75f0           jne 0x114f
|           0x0000115f      488b45f8       mov rax, qword [local_8h]
|           0x00001163      482b45e8       sub rax, qword [local_18h]
|           0x00001167      4889c6         mov rsi, rax
|           0x0000116a      488d3d930e00.  lea rdi, [0x00002004]       ; "%d\n" ; const char *format
|           0x00001171      b800000000     mov eax, 0
|           0x00001176      e8b5feffff     call sym.imp.printf         ; int printf(const char *format)
|           0x0000117b      90             nop
|           0x0000117c      c9             leave
\           0x0000117d      c3             ret

Decompiled code

fn sym.write_strlen () {
    int local_18h;
    unsigned int tmp;
    int local_8h;
    *((rsp - 8)) = rbp
    local_18h = rdi
    local_8h = local_18h
    while (1) {
        if (!((1 ^ (((((((*(local_8h) as 64) | (local_8h & 18446744069414584320)) & 4294967295) as 8) & ((((*(local_8h) as 64) | (local_8h & 18446744069414584320)) & 4294967295) as 8)) - 0) & 255)) as 1)) {
            tmp = sym.imp.printf(8196, unknown, rdx, rcx, r8, r9)
            return
        } else {
            local_8h = (local_8h + 1)
        }
    }
}

Overview

Radeco Architecture

diagram radare and radeco

Radeco takes information from radare2 through r2pipe interface, then it converts assembly code to internal representation. After some analyse steps (e.g., deadcode elimination, constant propagation, common subexpressions elimination), the C-like expressions are recovered from it, and pseudocode is generated after graph modified to remove goto statements.

My work was to implement recovering expressions and C pseudocode generation from C-like CFG.

Recovering expressions

This part of code located in backend::lang_c::c_cfg_builder. It recovers statements (assignment, if, goto, function call) and expressions from IR, which is assembly-like representation. Then we convert it to C-like representation with goto statements and gotos are structured into while/do-while statements with No More Gotos algorithm implemented by HMPerson1.

Generating pseudo C-code from C-like CFG

The code located in backend::lang_c::c_ast and backend::lang_c::c_cfg. It converts C-like CFG to C-AST with goto statements. Currently, this code used to check whether recovering expressions correct or not, since backend::ctrl_flow_struct does the task.

Further Implementaions

  • Simplify the conditions in “IF” statements
  • Recover return value and its type of decompiled function
  • Write more generic interface so that it would be more extensible
  • Improve r2 integration

Other