January 27, 2016

GreHack 2015 Re150 challenge: as painless as possible

In this article, we analyze a GreHack 2015 challenge: reverseMe (Re150).

This is not the purpose of this post to offer a documented write-up; one is already available here, based on an execution trace.

This is more about how we could have analyze this challenge, with the help of the Miasm framework (in addition with others tools, as IDA / radare2 / …).

Overview:

First thoughts
Emulation
CFG pattern matching
CFG simplification
Taking the right execution path
Password matching
Bonus
Conclusion

This analysis is based on Miasm revision d2588f5.

First thoughts

The target, reverseMe seems to be a 32 bit ELF. Its main function starts at 0x8048e78. One can view it through Miasm using the disasm/full.py example:

$ python miasm2/example/disasm/full.py reverseMe
$ xdot graph_execflow.dot

Results:

../../../_images/re150_graph_execflow_1.svg

CFG of reverseMe main function

Obviously, the binary starts by decrypting its next code.

Emulation

Even if we can manually apply the XOR, we decide to run it in the Miasm sandbox in order to dump the memory after the decryption process.

We will use one of the Sandbox subclasses, from miasm2.analysis.sandbox, which contains common element for emulation, such as memory emulation, stack initialization, mapping binary to memory, common options, etc.

Let’s inspire ourself from the example jitter/sandbox_pe_x86_32.py. Instead of Sandbox_Win_x86_32, we will use Sandbox_Linux_x86_32:

import os
from miasm2.analysis.sandbox import Sandbox_Linux_x86_32

# Insert here user defined methods

# Parse arguments
parser = Sandbox_Linux_x86_32.parser(description="ELF sandboxer")
parser.add_argument("filename", help="ELF Filename")
options = parser.parse_args()

# Create sandbox
sb = Sandbox_Linux_x86_32(options.filename, options, globals())

# Run
sb.run()

Let’s run it on our binary. Note that debuggers are available thanks to --debug (CLI) or --gdbserver (GDBserver, for instance with IDA) command line options.

We will use the -b (“dump blocks”) option, which output each freshly disassembled basic block encountered during emulation. As the loop (at 0x8048e7e) does not introduce new basic blocks, the output will only display this block once. If the code modifies a previously emulated block, the resulting block will be displayed the next time it will be reached.

$ python sandbox_elf_x86_32.py -b reverseMe
loc_0000000008048E78:0x08048e78
PUSH       EDX
MOV        EDX, 0x8048A5C
XOR        BYTE PTR [EDX], 0xBC
DEC        EDX
CMP        EDX, 0x8048A45
JAE        loc_0000000008048E7E:0x08048e7e
->      c_to:loc_0000000008048E7E:0x08048e7e    c_next:loc_0000000008048E8A:0x08048e8a
loc_0000000008048E7E:0x08048e7e
XOR        BYTE PTR [EDX], 0xBC
DEC        EDX
CMP        EDX, 0x8048A45
JAE        loc_0000000008048E7E:0x08048e7e
->      c_next:loc_0000000008048E8A:0x08048e8a  c_to:loc_0000000008048E7E:0x08048e7e
loc_0000000008048E8A:0x08048e8a
POP        EDX
JMP        loc_0000000008048A45:0x08048a45
->      c_to:loc_0000000008048A45:0x08048a45
loc_0000000008048A45:0x08048a45
...
loc_0000000008049961:0x08049961
POP        EDX
JMP        loc_000000000804B1F0:0x0804b1f0
->  c_to:loc_000000000804B1F0:0x0804b1f0
loc_000000000804B1F0:0x0804b1f0
MOV        EBX, 0x1
MOV        EAX, 0x4
INT        0x80
...
AssertionError

Considering the full output, there are 2 things to notice:

the decryption pattern seems to repeat itself several time
emulation ends with a crash: assert(self.get_exception() == 0)

This crash occurs because an exception is raised, through INT 0x80 and is not catched (we did not specify it to Miasm). But it seems that we have reached a new part of the code, different from the decryption steps. Let’s dump the current state of the memory and display the resulting CFG:

# Using -i, we obtain an interactive Python shell at the end of the script
$ python -i sandbox_elf_x86_32.py -b reverseMe
...
AssertionError

# sb is our Sandbox instance
# sb.jitter is the emulator instance
# sb.jitter.vm is the emulator virtual memory
>>> sb.jitter.vm
# Emulated memory. Columns are: address, size, memory permissions
# 0x130000 -> stack
ad 0x130000 size 0x10000 RW_
# 0x8048000 -> mapped binary
ad 0x8048000 size 0x4000 RW_

# Dump the binary (the ELF structure is conserved because in this specific
# binary, memory addresses are directly linked with file offset)
>>> open("dump.bin", "w").write(sb.jitter.vm.get_mem(0x8048000, 0x4000))

$ python miasm2/example/disasm/full.py dump.bin

Results:

../../../_images/re150_graph_execflow_2.svg

CFG of dump.bin main function, with several repetition of the decryption pattern

The resulting CFG is hard to read, due to the repeated pattern of decryption.

CFG pattern matching

We will use Miasm to remove this pattern from the CFG.

Let’s start with a basic disassembling script:

# Container is the wrapper for ELF, PE, ...
from miasm2.analysis.binary import Container
# Machine is the wrapper for managing multiple architecture
from miasm2.analysis.machine import Machine

cont = Container.from_stream(open("dump.bin"))
# Bin stream is a view on the mapped binary
bin_stream = cont.bin_stream

# 'cont.arch' is "x86_32", extracted from the ELF header
machine = Machine(cont.arch)
# Disassembly engine associated with the current binary
mdis = machine.dis_engine(bin_stream)
# Disassemble the main function
blocks = mdis.dis_multibloc(cont.entry_point)
# Get back the CFG as a dot file
open("cfg.dot", "w").write(blocks.dot())

In cfg.dot, we got again our unreadable CFG. For more information on the AsmCFG instance returned by mdis.dis_multibloc, have a look at PR #309.

Let’s define and match our pattern using MatchGraph (see PR #310).

from miasm2.core.graph import MatchGraphJoker
# Pattern we want to match:
#           |
#      +----v----+
#      |  (dad)  |
#      |  PUSH   |
#      |  MOV    |
#      +----+----+
#           |
#      +----v----+
#      | (middle)|<---+
#      +----+--+-+    |
#           |  +------+
#      +----v----+
#      |  (end)  |
#      +----+----+
#           |
#           v

# First block must have 2 assembly lines, a PUSH then a MOV, and an unlimited
# number of incomming edge
dad = MatchGraphJoker(name="dad", restrict_in=False,
              filt=lambda block: (len(block.lines) == 2 and
                      block.lines[0].name == "PUSH" and
                      block.lines[1].name == "MOV"))
middle = MatchGraphJoker(name="middle")
# Last block can have an unlimited number of outcomming edge
end = MatchGraphJoker(name="end", restrict_out=False)
# Build a MatchGraph by specifing edges (A >> B means an edge from A to B)
matcher = dad >> middle >> middle >> end

# Visualize it to be sure
open("to_match.dot", "w").write(matcher.dot())

Results:

Pattern to match, from matcher

Now, we can match corresponding blocks:

for sol in matcher.match(blocks):
    print sol[dad]

loc_00000000080496DE:0x080496de
PUSH       EDX
MOV        EDX, 0x8048AC0
->  c_next:loc_00000000080496E4:0x080496e4
loc_00000000080482F2:0x080482f2
PUSH       EDX
MOV        EDX, 0x804A9CE
->  c_next:loc_00000000080482F8:0x080482f8
loc_00000000080496C5:0x080496c5
PUSH       EDX
MOV        EDX, 0x8049DCB
->  c_next:loc_00000000080496CB:0x080496cb
loc_0000000008049981:0x08049981
PUSH       EDX
MOV        EDX, 0x8048DAE
->  c_next:loc_0000000008049987:0x08049987
loc_000000000804881F:0x0804881f
PUSH       EDX
MOV        EDX, 0x8049614
->  c_next:loc_0000000008048825:0x08048825
loc_0000000008049C88:0x08049c88
PUSH       EDX
MOV        EDX, 0x80480CA
->  c_next:loc_0000000008049C8E:0x08049c8e
loc_0000000008048E78:0x08048e78
PUSH       EDX
MOV        EDX, 0x8048A5C
->  c_next:loc_0000000008048E7E:0x08048e7e
...

CFG simplification

We will now remove this pattern from our CFG, linking its predecessors to its successors in order to keep a correct control flow. For this purpose, we will write a DiGraphSimplifier pass.

def block_merge(dgs, graph):
    """Remove the pattern defined in matcher
    @dgs: instance of DiGraphSimplifier (for recursive call purpose)
    @graph: graph to simplify
    """
    for sol in matcher.match(graph):
        # /!\ we modify the graph during an iteration on it
        # Our results do not overlap, we can avoid breaking the 'for' after each
        # modification

        # Links each successors to our pattern predecessors
        successors = graph.successors(sol[end])
        for pred in graph.predecessors(sol[dad]):
            for succ in successors:
                graph.add_edge(pred, succ,
                               graph.edges2constraint[(pred, sol[dad])])

        # Remove matched blocks (and associated edges) from the graph
        for node in sol.itervalues():
            graph.del_node(node)

Then, we instanciate a DiGraphSimplifier to associate our pass with it and apply it on our blocks.

from miasm2.core.graph import DiGraphSimplifier
dgs = DiGraphSimplifier()
dgs.enable_passes([block_merge])

blocks_after = dgs(blocks)
open("cfg_after.dot", "w").write(blocks_after.dot())

Viewing cfg_after.dot, we can now easily understand what the binary is doing. Keep in mind that only the executed path has been de-xored, so it is normal to obtain incomprehensible blocks on the other paths.

Simplified CFG, ending on password fail

Taking the right execution path

Now, if we analyze what is going on, we can see that the program first ensure that the top stack value (argc because there is no libc init here) equals 2.

We then slightly modify our sandbox_elf_x86_32.py to push our argument on the stack before the run (for more explanation, see PR #284).

from miasm2.core.types import Str, set_allocator
from miasm2.os_dep.common import heap

# Instanciate a heap and set it as the default memory allocator
set_allocator(heap().vm_alloc)
MemStrAnsi = Str().lval
# Allocate a string containing a dummy password
addr_pwd = MemStrAnsi.from_str(sb.jitter.vm, "SomePassword").get_addr()
# The program name, actually unused here
addr_bin = MemStrAnsi.from_str(sb.jitter.vm, "reverseMe").get_addr()

# Push arguments on the stack
sb.jitter.push_uint32_t(addr_pwd)
sb.jitter.push_uint32_t(addr_bin)
sb.jitter.push_uint32_t(0x2)

# Run
sb.run()

Here, manipulated structure are simples; we could use direct memory page allocation instead of the type system wrapper.

addr_pwd = 0x112233
addr_bin = 0x113344
from miasm2.jitter.csts import PAGE_READ, PAGE_WRITE
# Add a memory pages containing dummy arguments
sb.jitter.vm.add_memory_page(addr_pwd, PAGE_READ | PAGE_WRITE, "SomePassword" + "\x00")
sb.jitter.vm.add_memory_page(addr_bin, PAGE_READ, "reverseMe" + "\x00")

Running again our script, we end up with a new error:

loc_000000000804B1DD:0x0804b1dd
INC        ECX
XOR        BYTE PTR [EBX+ECX], 0x33
JMP        loc_0000000008049EAE:0x08049eae
->  c_to:loc_0000000008049EAE:0x08049eae
WARNING: address 0x2000000D is not mapped in virtual memory:

An access has been tried on a non-allocated memory. This access has been attempted by the XOR BYTE PTR [EBX+ECX], 0x33 instruction. As the dumpblock option dumps the full basic block before executing it, we also see the next instruction. Using the -z (--singlestep) argument, we obtain a step by step execution trace (with registers values printed before the instruction execution):

RAX 0000000000000000 RBX 0000000020000000 RCX 000000000000000D RDX 0000000000000000
RSI 0000000000000000 RDI 0000000000000000 RSP 000000000013FFF0 RBP 0000000000000000
zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000001
RIP 000000000804B1DD
0804B1DE XOR        BYTE PTR [EBX+ECX], 0x33
WARNING: address 0x2000000D is not mapped in virtual memory:

Actually, the program expects a longer input. We can either map a bigger memory page (for instance, 4KB) or use dichotomy to find the exact expected input size, which is left as an exercise to the reader. We obtain an expected input size of 28, so, for instance:

from string import printable
# printable avoid character repetition, useful for debugging
addr_pwd = MemStrAnsi.from_str(sb.jitter.vm, printable[:28]).get_addr()

As before, we now end on a INT 0x80.

Simplified CFG, with a remaining pattern

Using our previous script (with a new dump, dump2.bin), we output the resulting CFG.

Password matching

In the resulting CFG, we observe an instance of our pattern which have not been simplified (at 0x80491b1). Indeed, this is not exactly our pattern; the middle block (at 0x80491b7) has an incomming constraint c_next from 0x80491b5. As the instructions in this last block are still encrypted, they are disassembled as a ADD AL, 0x8. So, after executing this instruction, the next one must be at 0x80491b7 (this is what c_next mean: the block must be next to this predecessor). To remove this artefact, at least two options:

modify the pattern to remove the restriction on the number of incomming edge for middle (restrict_in=False)
tell the disassembler engine to stop on the bad JNZ branch, at 0x8049215

For the purpose of highlighting Miasm API, we choose the second option. We then modify our disassembler to tell it to stop disassembling if this specific address is reached.

# Disassembly engine associated with the current binary
mdis = machine.dis_engine(bin_stream, dont_dis=[0x8049215])

We end with a spaghetti CFG, that is to say a list of basic block with only one successor and only one predecessor.

Simplified, final CFG

If we analyze them and particularly the final loop, we understand that the code is XOR-ing each byte of our input with a given key, then test that the result is full of null byte. In other words, our input must be the key.

As this key computing is very simple to understand, a symbolic execution and SMT-solvers (such as z3) based approach is not necessary. It may be dealt with on a better example in a future post (cliffhanger!).

Instead of manually retrieving the key, let’s use a graph traversal starting from the CFG root and printing each XOR constant:

key = []
# Only one head (i.e. root) in our graph
head = blocks_after.heads()[0]
for block in blocks_after.walk_depth_first_forward(head):
    # Depth first traversal starting from the root
    if (len(block.lines) == 3 and
        block.lines[1].name == "XOR"):
        # Arguments are ([EBX + ECX], CST), in Miasm Intermediate Representation
        cst = block.lines[1].args[1]
        key.append(chr(int(cst.arg)))

# Output the resulting key
print "".join(key)

Which outs: Alph4t3stf0rc3mille!!!!!!!!!.

So, finally:

$ ./reverseMe 'Alph4t3stf0rc3mille!!!!!!!!!'
Well done!

Bonus

As noted before, this is a spaghetti code. Miasm already provides a simplification pass for this kind of code, using:

from miasm2.core.asmbloc import bbl_simplifier
merged = bbl_simplifier(blocks_after)
open("merged.dot", "w").write(merged.dot())

Resulting in a bigger basic block:

Merged basic block

Conclusion

On a Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz single core (these values are the mean on several execution, removing extremes):

the complete execution (with the right path and decryption) takes 21.376s
the graph simplification and the solution extraction takes 1.540s

Finally, analyzing the challenge and writing these small scripts does not take more than 20 minutes (for somebody knowing the Miasm API). A first pass has been done to obtain the relevant code, through emulation. Once the annoying patterns are detected, matching and removing them only takes a couple of minutes.

As this method can be applied on other binaries, like malwares, we may write a second article on how to rebuild a working binary with the pattern and the decryption process removed, leading to a close-to-the-original binary.

Final scripts:

Posted by commial

Tags: disasm, emulation, graph