GreHack 2015 Re150 challenge: as painless as possible
In this article, we analyze a GreHack 2015 challenge: reverseMe (Re150).
This is not the purpose of this post to offer a documented write-up; one is already available here, based on an execution trace.
This is more about how we could have analyze this challenge, with the help of the Miasm framework (in addition with others tools, as IDA / radare2 / ...).
This analysis is based on Miasm revision d2588f5.
The target, reverseMe seems to be a 32 bit ELF. Its main function starts at 0x8048e78. One can view it through Miasm using the disasm/full.py example:
$ python miasm2/example/disasm/full.py reverseMe $ xdot graph_execflow.dot
Obviously, the binary starts by decrypting its next code.
Even if we can manually apply the XOR, we decide to run it in the Miasm sandbox in order to dump the memory after the decryption process.
We will use one of the Sandbox subclasses, from miasm2.analysis.sandbox, which contains common element for emulation, such as memory emulation, stack initialization, mapping binary to memory, common options, etc.
Let’s inspire ourself from the example jitter/sandbox_pe_x86_32.py. Instead of Sandbox_Win_x86_32, we will use Sandbox_Linux_x86_32:
import os from miasm2.analysis.sandbox import Sandbox_Linux_x86_32 # Insert here user defined methods # Parse arguments parser = Sandbox_Linux_x86_32.parser(description="ELF sandboxer") parser.add_argument("filename", help="ELF Filename") options = parser.parse_args() # Create sandbox sb = Sandbox_Linux_x86_32(options.filename, options, globals()) # Run sb.run()
Let’s run it on our binary. Note that debuggers are available thanks to --debug (CLI) or --gdbserver (GDBserver, for instance with IDA) command line options.
We will use the -b (“dump blocks”) option, which output each freshly disassembled basic block encountered during emulation. As the loop (at 0x8048e7e) does not introduce new basic blocks, the output will only display this block once. If the code modifies a previously emulated block, the resulting block will be displayed the next time it will be reached.
$ python sandbox_elf_x86_32.py -b reverseMe loc_0000000008048E78:0x08048e78 PUSH EDX MOV EDX, 0x8048A5C XOR BYTE PTR [EDX], 0xBC DEC EDX CMP EDX, 0x8048A45 JAE loc_0000000008048E7E:0x08048e7e -> c_to:loc_0000000008048E7E:0x08048e7e c_next:loc_0000000008048E8A:0x08048e8a loc_0000000008048E7E:0x08048e7e XOR BYTE PTR [EDX], 0xBC DEC EDX CMP EDX, 0x8048A45 JAE loc_0000000008048E7E:0x08048e7e -> c_next:loc_0000000008048E8A:0x08048e8a c_to:loc_0000000008048E7E:0x08048e7e loc_0000000008048E8A:0x08048e8a POP EDX JMP loc_0000000008048A45:0x08048a45 -> c_to:loc_0000000008048A45:0x08048a45 loc_0000000008048A45:0x08048a45 ... loc_0000000008049961:0x08049961 POP EDX JMP loc_000000000804B1F0:0x0804b1f0 -> c_to:loc_000000000804B1F0:0x0804b1f0 loc_000000000804B1F0:0x0804b1f0 MOV EBX, 0x1 MOV EAX, 0x4 INT 0x80 ... AssertionError
Considering the full output, there are 2 things to notice:
- the decryption pattern seems to repeat itself several time
- emulation ends with a crash: assert(self.get_exception() == 0)
This crash occurs because an exception is raised, through INT 0x80 and is not catched (we did not specify it to Miasm). But it seems that we have reached a new part of the code, different from the decryption steps. Let’s dump the current state of the memory and display the resulting CFG:
# Using -i, we obtain an interactive Python shell at the end of the script $ python -i sandbox_elf_x86_32.py -b reverseMe ... AssertionError
# sb is our Sandbox instance # sb.jitter is the emulator instance # sb.jitter.vm is the emulator virtual memory >>> sb.jitter.vm # Emulated memory. Columns are: address, size, memory permissions # 0x130000 -> stack ad 0x130000 size 0x10000 RW_ # 0x8048000 -> mapped binary ad 0x8048000 size 0x4000 RW_ # Dump the binary (the ELF structure is conserved because in this specific # binary, memory addresses are directly linked with file offset) >>> open("dump.bin", "w").write(sb.jitter.vm.get_mem(0x8048000, 0x4000))
$ python miasm2/example/disasm/full.py dump.bin
The resulting CFG is hard to read, due to the repeated pattern of decryption.
We will use Miasm to remove this pattern from the CFG.
Let’s start with a basic disassembling script:
# Container is the wrapper for ELF, PE, ... from miasm2.analysis.binary import Container # Machine is the wrapper for managing multiple architecture from miasm2.analysis.machine import Machine cont = Container.from_stream(open("dump.bin")) # Bin stream is a view on the mapped binary bin_stream = cont.bin_stream # 'cont.arch' is "x86_32", extracted from the ELF header machine = Machine(cont.arch) # Disassembly engine associated with the current binary mdis = machine.dis_engine(bin_stream) # Disassemble the main function blocks = mdis.dis_multibloc(cont.entry_point) # Get back the CFG as a dot file open("cfg.dot", "w").write(blocks.dot())
In cfg.dot, we got again our unreadable CFG. For more information on the AsmCFG instance returned by mdis.dis_multibloc, have a look at PR #309.
Let’s define and match our pattern using MatchGraph (see PR #310).
from miasm2.core.graph import MatchGraphJoker # Pattern we want to match: # | # +----v----+ # | (dad) | # | PUSH | # | MOV | # +----+----+ # | # +----v----+ # | (middle)|<---+ # +----+--+-+ | # | +------+ # +----v----+ # | (end) | # +----+----+ # | # v # First block must have 2 assembly lines, a PUSH then a MOV, and an unlimited # number of incomming edge dad = MatchGraphJoker(name="dad", restrict_in=False, filt=lambda block: (len(block.lines) == 2 and block.lines.name == "PUSH" and block.lines.name == "MOV")) middle = MatchGraphJoker(name="middle") # Last block can have an unlimited number of outcomming edge end = MatchGraphJoker(name="end", restrict_out=False) # Build a MatchGraph by specifing edges (A >> B means an edge from A to B) matcher = dad >> middle >> middle >> end # Visualize it to be sure open("to_match.dot", "w").write(matcher.dot())
Now, we can match corresponding blocks:
for sol in matcher.match(blocks): print sol[dad]
loc_00000000080496DE:0x080496de PUSH EDX MOV EDX, 0x8048AC0 -> c_next:loc_00000000080496E4:0x080496e4 loc_00000000080482F2:0x080482f2 PUSH EDX MOV EDX, 0x804A9CE -> c_next:loc_00000000080482F8:0x080482f8 loc_00000000080496C5:0x080496c5 PUSH EDX MOV EDX, 0x8049DCB -> c_next:loc_00000000080496CB:0x080496cb loc_0000000008049981:0x08049981 PUSH EDX MOV EDX, 0x8048DAE -> c_next:loc_0000000008049987:0x08049987 loc_000000000804881F:0x0804881f PUSH EDX MOV EDX, 0x8049614 -> c_next:loc_0000000008048825:0x08048825 loc_0000000008049C88:0x08049c88 PUSH EDX MOV EDX, 0x80480CA -> c_next:loc_0000000008049C8E:0x08049c8e loc_0000000008048E78:0x08048e78 PUSH EDX MOV EDX, 0x8048A5C -> c_next:loc_0000000008048E7E:0x08048e7e ...
We will now remove this pattern from our CFG, linking its predecessors to its successors in order to keep a correct control flow. For this purpose, we will write a DiGraphSimplifier pass.
def block_merge(dgs, graph): """Remove the pattern defined in matcher @dgs: instance of DiGraphSimplifier (for recursive call purpose) @graph: graph to simplify """ for sol in matcher.match(graph): # /!\ we modify the graph during an iteration on it # Our results do not overlap, we can avoid breaking the 'for' after each # modification # Links each successors to our pattern predecessors successors = graph.successors(sol[end]) for pred in graph.predecessors(sol[dad]): for succ in successors: graph.add_edge(pred, succ, graph.edges2constraint[(pred, sol[dad])]) # Remove matched blocks (and associated edges) from the graph for node in sol.itervalues(): graph.del_node(node)
Then, we instanciate a DiGraphSimplifier to associate our pass with it and apply it on our blocks.
from miasm2.core.graph import DiGraphSimplifier dgs = DiGraphSimplifier() dgs.enable_passes([block_merge]) blocks_after = dgs(blocks) open("cfg_after.dot", "w").write(blocks_after.dot())
Viewing cfg_after.dot, we can now easily understand what the binary is doing. Keep in mind that only the executed path has been de-xored, so it is normal to obtain incomprehensible blocks on the other paths.
Now, if we analyze what is going on, we can see that the program first ensure that the top stack value (argc because there is no libc init here) equals 2.
We then slightly modify our sandbox_elf_x86_32.py to push our argument on the stack before the run (for more explanation, see PR #284).
from miasm2.core.types import Str, set_allocator from miasm2.os_dep.common import heap # Instanciate a heap and set it as the default memory allocator set_allocator(heap().vm_alloc) MemStrAnsi = Str().lval # Allocate a string containing a dummy password addr_pwd = MemStrAnsi.from_str(sb.jitter.vm, "SomePassword").get_addr() # The program name, actually unused here addr_bin = MemStrAnsi.from_str(sb.jitter.vm, "reverseMe").get_addr() # Push arguments on the stack sb.jitter.push_uint32_t(addr_pwd) sb.jitter.push_uint32_t(addr_bin) sb.jitter.push_uint32_t(0x2) # Run sb.run()
Here, manipulated structure are simples; we could use direct memory page allocation instead of the type system wrapper.
addr_pwd = 0x112233 addr_bin = 0x113344 from miasm2.jitter.csts import PAGE_READ, PAGE_WRITE # Add a memory pages containing dummy arguments sb.jitter.vm.add_memory_page(addr_pwd, PAGE_READ | PAGE_WRITE, "SomePassword" + "\x00") sb.jitter.vm.add_memory_page(addr_bin, PAGE_READ, "reverseMe" + "\x00")
Running again our script, we end up with a new error:
loc_000000000804B1DD:0x0804b1dd INC ECX XOR BYTE PTR [EBX+ECX], 0x33 JMP loc_0000000008049EAE:0x08049eae -> c_to:loc_0000000008049EAE:0x08049eae WARNING: address 0x2000000D is not mapped in virtual memory:
An access has been tried on a non-allocated memory. This access has been attempted by the XOR BYTE PTR [EBX+ECX], 0x33 instruction. As the dumpblock option dumps the full basic block before executing it, we also see the next instruction. Using the -z (--singlestep) argument, we obtain a step by step execution trace (with registers values printed before the instruction execution):
RAX 0000000000000000 RBX 0000000020000000 RCX 000000000000000D RDX 0000000000000000 RSI 0000000000000000 RDI 0000000000000000 RSP 000000000013FFF0 RBP 0000000000000000 zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000001 RIP 000000000804B1DD 0804B1DE XOR BYTE PTR [EBX+ECX], 0x33 WARNING: address 0x2000000D is not mapped in virtual memory:
Actually, the program expects a longer input. We can either map a bigger memory page (for instance, 4KB) or use dichotomy to find the exact expected input size, which is left as an exercise to the reader. We obtain an expected input size of 28, so, for instance:
from string import printable # printable avoid character repetition, useful for debugging addr_pwd = MemStrAnsi.from_str(sb.jitter.vm, printable[:28]).get_addr()
As before, we now end on a INT 0x80.
Using our previous script (with a new dump, dump2.bin), we output the resulting CFG.
In the resulting CFG, we observe an instance of our pattern which have not been simplified (at 0x80491b1). Indeed, this is not exactly our pattern; the middle block (at 0x80491b7) has an incomming constraint c_next from 0x80491b5. As the instructions in this last block are still encrypted, they are disassembled as a ADD AL, 0x8. So, after executing this instruction, the next one must be at 0x80491b7 (this is what c_next mean: the block must be next to this predecessor). To remove this artefact, at least two options:
- modify the pattern to remove the restriction on the number of incomming edge for middle (restrict_in=False)
- tell the disassembler engine to stop on the bad JNZ branch, at 0x8049215
For the purpose of highlighting Miasm API, we choose the second option. We then modify our disassembler to tell it to stop disassembling if this specific address is reached.
# Disassembly engine associated with the current binary mdis = machine.dis_engine(bin_stream, dont_dis=[0x8049215])
We end with a spaghetti CFG, that is to say a list of basic block with only one successor and only one predecessor.
If we analyze them and particularly the final loop, we understand that the code is XOR-ing each byte of our input with a given key, then test that the result is full of null byte. In other words, our input must be the key.
As this key computing is very simple to understand, a symbolic execution and SMT-solvers (such as z3) based approach is not necessary. It may be dealt with on a better example in a future post (cliffhanger!).
Instead of manually retrieving the key, let’s use a graph traversal starting from the CFG root and printing each XOR constant:
key =  # Only one head (i.e. root) in our graph head = blocks_after.heads() for block in blocks_after.walk_depth_first_forward(head): # Depth first traversal starting from the root if (len(block.lines) == 3 and block.lines.name == "XOR"): # Arguments are ([EBX + ECX], CST), in Miasm Intermediate Representation cst = block.lines.args key.append(chr(int(cst.arg))) # Output the resulting key print "".join(key)
Which outs: Alph4t3stf0rc3mille!!!!!!!!!.
$ ./reverseMe 'Alph4t3stf0rc3mille!!!!!!!!!' Well done!
As noted before, this is a spaghetti code. Miasm already provides a simplification pass for this kind of code, using:
from miasm2.core.asmbloc import bbl_simplifier merged = bbl_simplifier(blocks_after) open("merged.dot", "w").write(merged.dot())
Resulting in a bigger basic block:
On a Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz single core (these values are the mean on several execution, removing extremes):
- the complete execution (with the right path and decryption) takes 21.376s
- the graph simplification and the solution extraction takes 1.540s
Finally, analyzing the challenge and writing these small scripts does not take more than 20 minutes (for somebody knowing the Miasm API). A first pass has been done to obtain the relevant code, through emulation. Once the annoying patterns are detected, matching and removing them only takes a couple of minutes.
As this method can be applied on other binaries, like malwares, we may write a second article on how to rebuild a working binary with the pattern and the decryption process removed, leading to a close-to-the-original binary.