September 03, 2016

ZeusVM analysis

Overview:

First Stages
VM Analysis
First mnemonic
Second mnemonic
Third mnemonic
Fourth mnemonic
Automating the analysis
Results
Fun with symbolic execution
Final words

In this article, we will study an old Zeus sample protected by a virtual machine. We will begin with the analysis of the VM structure, and automatize its reverse engineering using Miasm.

The sample is zeus_sample.zip. All zips in this post are protected using the password “infected”.

This analysis is based on Miasm revision f2a9a35. The corresponding Elfesteem revision is 1ee9171.

First Stages

The binary starts with a decryption loop. We will skip this part to focus on our goal: the VM. The decryption ends at address 0x403C26. The final code is executed using a call to GDI32/EnumFontsA which takes a function callback, in our case, the second part (zeusvm_stage2.zip).

The second stage can be emulated using Miasm (see dynamic shellcode analysis for more information). Miasm only provides a minimalistic windows structures and functions support, so the code will break during module search for rebuilding original import table. In the malware code, instead of stopping at the unicode null byte to stop the module name hash computation, the length is harcoded. So the code will crash here. You can add more padding in the allocation in miasm/os_dep/win_api_x86_32_seh.py, in the create_module_chain function.

jitter.vm.add_memory_page(addr + offset_name, PAGE_READ | PAGE_WRITE,
                          bname + "\x00" * 3 +"\x00"*0x10,
                          "Module name %r" % bname_str)

Some more work later, we extract the final stage zeusvm_stage3.zip.

VM Analysis

The targeted function is located at address 0x414795. The plan is to start with a quick manual analysis to recover the VM structure, and then to write a script to automate the full process.

CFG of the mnemonic handler

The interesting part is the loop inside the function. It has a specificity:

MOV        EAX, DWORD PTR [EBP+0xFFFFFFB0]
MOVZX      EAX, BYTE PTR [EAX]
LEA        ECX, DWORD PTR [EBP+0xFFFFFFB0]
CALL       DWORD PTR [EAX*0x4+0x427018]

The CALL target is [EAX*0x4+0x427018]. This means the address 0x427018 is an array of sub-functions addresses. And here is a dump of this area:

.data:00427018      dd offset sub_40EDE0
.data:0042701C      dd offset sub_40EDFB
.data:00427020      dd offset sub_40EE19
.data:00427024      dd offset sub_40EE3A
.data:00427028      dd offset sub_40EE64
.data:0042702C      dd offset sub_40EE92
.data:00427030      dd offset sub_40EEBE
.data:00427034      dd offset sub_40EEE8
.data:00427038      dd offset sub_40EF16
...

This is called the “mnemonic handler”. In a classical virtual machine, the code fetches the opcode of the current mnemonic at the VM program counter VM_PC, and jumps to the function which will emulate the mnemonic behavior.

This means EAX contains the opcode number. If we track it back to its origin, we land on the MOVZX, which reads the memory from the pointer stored in [EBP+0xFFFFFFB0]. We can guess that it may correspond to VM_PC.

Note that after the mnemonic handler call, register AL is tested to continue or stop the virtual machine execution.

First mnemonic

Note that ECX points to the address of the VM_PC. Let’s look at the first mnemonic at 0x40EDE0.

CFG of the mnemonic opcode 0

Here starts the hard part. The first line reads the content of ECX, which may be the address of the VM_PC, and assigns it to EAX. If we skip the two next lines, VM_PC is incremented, and stored back to its location. This may mean the first side effect of this mnemonic is to increment the VM_PC, implying that the mnemonic length is one.

Back to the second line. It reads a byte of memory at EAX (VM_PC in our case). This can be interpreted as the first byte of the current mnemonic. This byte is then xored with 0xE9 and kept in DL.

The code then checks if the byte at EAX (which is now the original VM_PC plus one) has its most significant bit set. If it’s cleared, the function returns zero, and DL is not used. In the other case, the code reads again the byte at VM_PC+1, xor it using DL, clear the most significant bit and update it in memory. Then, the function returns 1.

Let’s summary the mnemonic:

increment VM_PC by one
mess up the next mnemonic byte at VM_PC+1 under certain conditions

Note that if we look back at the parent function, the fact that the mnemonic returns one means the loop will continue its job.

Second mnemonic

Here is the second mnemonic.

CFG of the mnemonic opcode 2

Sounds familiar? The code is nearly the same as the previous one. The only differences are:

the VM_PC is incremented by two
the messed up byte is at VM_PC+2, and the byte from VM_PC+1 instead of VM_PC is xored with 0x78 instead of 0xE9

Third mnemonic

CFG of the mnemonic opcode 3

Easy one again:

VM_PC += 4
same mess on byte at address VM_PC+4, using different xor values.

Fourth mnemonic

CFG of the mnemonic opcode 4

Ha! A bit different. There is the byte messing up part which is similar to the first three mnemonics. But there is more: the code reads an integer from ECX+4, and xors the byte at this address with the byte at VM_PC+1, which can be interpreted as the second byte of the current instruction. The trick is we don’t know for the moment what is at ECX+4. As the value at ECX is VM_PC, maybe we can infer the value at ECX+4 is another register of the VM (let’s say R0)

The VM_PC is incremented twice, which means the mnemonic length may be 2 (the current opcode, and the value of the xor).

This mnemonic can be written as:

XOR BYTE PTR [R0], u08

with u08 an immediate from the mnemonic bytes.

Another question remains: what is the mess done with the xor at each mnemonic? As we could see, each mnemonic will test the most significant bit of the next byte and xor it with a hardcoded byte. Interpretation: the current mnemonic decrypts the first byte of the next mnemonic if its most significant bit is set. It then clears it, meaning the mnemonic is decrypted.

This means the three first mnemonics are in fact NOPS of three different lengths because they do nothing (except the decryption and the VM_PC update).

As we saw in the parent function, the opcode is coded on a single byte, but this byte is always decrypted so its most significant bit is zero. This means we may have at most 0x7F mnemonics. If we look back at the mnemonics table, we can see it ends a bit before this limit. Our hypothesis seems legit. In our case we have 69 mnemonics. Well, there are a bit too many mnemonics to analyze all of them manually.

Automating the analysis

The goal of this section is to automatically retrieve each mnemonic’s behaviour. To do this, we will symbolically execute them and analyze the resulting side effects. Here is a first draft of the symbolic execution code (other symbolic examples are available in previous post):

from argparse import ArgumentParser
from pdb import pm

from miasm2.analysis.machine import Machine
from miasm2.analysis.binary import Container
from miasm2.ir.symbexec import symbexec
from miasm2.arch.x86.sem import ir_x86_32
from miasm2.arch.x86 import regs
from miasm2.core.utils import *
from miasm2.expression.expression import *
from miasm2.core import asmbloc

# Transform native assembly into IR
def get_block(ir_arch, mdis, ad):
    lbl = ir_arch.get_label(ad)
    if not lbl in ir_arch.blocs:
        b = mdis.dis_bloc(lbl.offset)
        ir_arch.add_bloc(b)
    b = ir_arch.get_bloc(lbl)
    if b is None:
        raise LookupError('No block found at that address: %s' % lbl)
    return b

parser = ArgumentParser("XXXXX TODO")
parser.add_argument('filename', help="File to disassemble")
args = parser.parse_args()

machine = Machine("x86_32")
cont = Container.from_stream(open(args.filename))
bs = cont.bin_stream
mdis = machine.dis_engine(bs, symbol_pool=cont.symbol_pool)
ir_arch = ir_x86_32(mdis.symbol_pool)
sb = symbexec(ir_arch, regs.regs_init)

mnemonic_array_addr = 0x427018
ad = ExprInt32(upck32(bs.getbytes(mnemonic_array_addr, 4)))


print 'Mnemonic addr', ad
while True:
    print 'Block', ad
    get_block(ir_arch, mdis, ad)
    ad = sb.emul_ir_bloc(ir_arch, ad)
    if not isinstance(ad, ExprInt):
        raise ValueError("Unknown destination %s" % ad)

The first part of the code initializes the disassembly engine, the target and the symbolic execution engine. regs.regs_init is a dictionnary that specifies the initial symbolic value of all registers. Note that each register is initialized to a distinct symbol: EAX_init for EAX, EBX_init for EBX and so on. Here is the first run:

Mnemonic addr 0x40EDE0
Block 0x40EDE0
Traceback (most recent call last):
  File "emul_instr_01.py", line 46, in <module>
    raise ValueError("Unknown destination %s" % ad)
ValueError: Unknown destination ((@8[(@32[ECX_init]+0x1)]&0x80)?(0x40EDEF,0x40EDF8))

The first basic block is executed but it stalls on the decision for the conditional jump. The destination is:

((@8[(@32[ECX_init]+0x1)]&0x80)?(0x40EDEF,0x40EDF8))

As everything is symbolic, we can’t resolve this test and we don’t known which path to take. But if we look at the condition of the expression, we can deduce it’s the test on the next (de)ciphered mnemonic. @8[(@32[ECX_init]+0x1)] & 0x80 is in fact @8[VM_PC_init+0x1] & 0x80, which is the MSB of the next mnemonic opcode. We will first give to the engine the information that @32[ECX_init] is the VM_PC_init.

vm_pc_init = ExprId('VM_PC_init')
infos = {}
infos[expr_simp(ExprMem(regs.ECX_init, 32))] = vm_pc_init

The use of expr_simp is here to ensure the expression will have a canonical and simplified form. The symbolic engine always works with expressions which have both of those properties, so if you add information to it, you have to use this form.

Here is the new output:

ValueError: Unknown destination ((@8[(VM_PC_init+0x1)]&0x80)?(0x40EDEF,0x40EDF8))

Our goal here is to add the information that @8[VM_PC] & 0x80 resolves to 0x80 at runtime. The fact is that we don’t known how the VM_PC has been modified during the execution. Here, we will use the engine to resolve the current value of VM_PC and add the information using an expression based on this value.

In our case, the information to add is @32[VM_PC] & 0x80 == 0x80. First we will evaluate VM_PC, which gives us VM_PC_init+1. So the final information is @32[VM_PC_init+1] & 0x80 == 0x80.

The code will then resolve the next destination, which will decrypt the next opcode. We will be able to retrieve its equation later.

if isinstance(ad, ExprCond):
    print "Condition in destination:", ad
    vm_pc = sb.eval_expr(ExprMem(regs.ECX_init))
    print 'Current VM_PC:', vm_pc
    ad = ad.replace_expr({ExprMem(vm_pc, 8) & ExprInt8(0x80):ExprInt8(0x80)})
    ad = expr_simp(ad)
    print 'Final destination:', ad

Here is the result:

...
Condition in destination: ((@8[(VM_PC_init+0x1)]&0x80)?(0x40EDEF,0x40EDF8))
Current VM_PC: (VM_PC_init+0x1)
Final destination: 0x40EDEF

The code is redirected to the correct case! But we are facing another error:

Traceback (most recent call last):
  File "emul_instr_02.py", line 101, in <module>
    raise ValueError("Unknown destination %s" % ad)
ValueError: Unknown destination @32[ESP_init]

The code has resolved the test and has executed the next block. As the function terminates on a RET, the destination is resolved as @32[ESP_init] and it fails. We forgot that this code is called through a CALL. We could stop by detecting a jump to the expression @32[ESP_init], which means we are getting back to the caller. For the exercise, we will add the information that a special symbol ExprId('RET_ADDR') has been pushed to the stack before the execution of the function. If PC is evaluated to this symbol (for example after the RET of the function), we stop the execution.

ret_addr = ExprId('RET_ADDR')
# Push return addr
infos[expr_simp(ExprMem(regs.ESP_init-ExprInt32(4)))] = ret_addr
infos[regs.ESP] = expr_simp(regs.ESP_init-ExprInt32(4))

So after each block execution:

if ad is ret_addr:
    print "Ret addr reached"
    break

Which is validated on our current mnemonic:

...
Ret addr reached

We completely achieved the symbolic execution of the first mnemonic! Let’s display the final state of the symbolic engine:

>>> sb.dump_id()
IRDst RET_ADDR
zf (((@8[VM_PC_init]^@8[(VM_PC_init+0x1)]^0xE9)&0x7F)?(0x0,0x1))
nf 0x0
pf parity(((@8[VM_PC_init]^@8[(VM_PC_init+0x1)]^0xE9)&0x7F))
of 0x0
cf 0x0
af (VM_PC_init^(VM_PC_init+0x1)^0x1)[4:5]
EIP RET_ADDR
EAX {0x1,0,8, (VM_PC_init+0x1)[8:32],8,32}
ECX {((@8[VM_PC_init]^@8[(VM_PC_init+0x1)]^0xE9)&0x7F),0,8, ECX_init[8:32],8,32}
EDX {(@8[VM_PC_init]^0xE9),0,8, EDX_init[8:32],8,32}
>>> sb.dump_mem()
@8[(VM_PC_init+0x1)] ((@8[VM_PC_init]^@8[(VM_PC_init+0x1)]^0xE9)&0x7F)
@32[ECX_init] (VM_PC_init+0x1)

dump_id and dump_mem only display variables whose values are different from their initial state (given at the symbolic engine creation). Note that, in the last line, the information we have added to simplify the output is present as well. We will add our own display function to have a good and predictable output.

def dump_state(sb):
    print '-'*20, "State", '-'*20
    out = {}
    for expr, value in sorted(sb.symbols.items()):
        if (expr, value) in initial_symbols:
            continue
        if (expr, value) in infos:
            continue
        if expr in [regs.zf, regs.cf, regs.nf, regs.of, regs.pf, regs.af,
                    ir_arch.IRDst, regs.EIP]:
            continue
        expr = expr_simp(expr.replace_expr(infos))
        value = expr_simp(value.replace_expr(infos))
        if expr == value:
            continue
        out[expr] = value

    out = sorted(out.iteritems())
    x86_regs = []
    mem = []
    other = []
    for expr, value in out:
        if expr in regs.all_regs_ids:
            x86_regs.append((expr, value))
        elif isinstance(expr, ExprMem):
            mem.append((expr, value))
        else:
            other.append((expr, value))

    print "Regs:"
    for item in other:
        print "\t%s = %s" % item
    print "Mem:"
    for item in mem:
        print "\t%s = %s" % item
    print "x86:"
    for item in x86_regs:
        print "\t%s = %s" % item
    print

Which gives the output:

-------------------- State --------------------
Regs:
        VM_PC_init = (VM_PC_init+0x1)
Mem:
        @8[(VM_PC_init+0x1)] = ((@8[VM_PC_init]^@8[(VM_PC_init+0x1)]^0xE9)&0x7F)
x86:
        EAX = {0x1,0,8, (VM_PC_init+0x1)[8:32],8,32}
        ECX = {((@8[VM_PC_init]^@8[(VM_PC_init+0x1)]^0xE9)&0x7F),0,8, ECX_init[8:32],8,32}
        EDX = {(@8[VM_PC_init]^0xE9),0,8, EDX_init[8:32],8,32}

The result matches our previous findings! Here is the result for the fourth mnemonic:

-------------------- State --------------------
Regs:
        VM_PC_init = (VM_PC_init+0x3)
Mem:
        @32[(ESP_init+0xFFFFFFF8)] = ESI_init
        @8[(VM_PC_init+0x3)] = ((@8[(VM_PC_init+0x1)]^@8[(VM_PC_init+0x3)]^0x47)&0x7F)
        @32[(ECX_init+0x4)] = (@32[(ECX_init+0x4)]+0x2)
        @16[@32[(ECX_init+0x4)]] = (@16[@32[(ECX_init+0x4)]]^@16[(VM_PC_init+0x1)])
x86:
        EAX = {0x1,0,8, @16[(VM_PC_init+0x1)][8:16],8,16, (VM_PC_init+0x1)[16:32],16,32}
        ECX = (VM_PC_init+0x3)
        EDX = {(@8[(VM_PC_init+0x1)]^0x47),0,8, EDX_init[8:32],8,32}

Notice the @32[(ESP_init+0xFFFFFFF8)] = ESI_init. This variable corresponds in the code to:

PUSH    ESI
...
POP     ESI

We can notice as well that ESP is not displayed, so its value at the end of the emulation is its initial state (ESP_init). Here, we can make the assumption that the code doesn’t use variables above the stack, so at each step of the symbolic emulation, we can remove the memory variables based on ESP_init which are above the current ESP.

For example, imagine we have a PUSH EAX which, at runtime, gives @32[ESP_init-0x10] = EAX_init. Later, we pop some values and the runtime value of ESP is ESP_init-0x4. Based on our hypothesis, the previous value is above the stack, so we can remove it. If you wonder “how can we compare two symbolic values?”, here is the answer: subtraction. Here it is:

(ESP_init-0x10) - (ESP_init-0x4) = -0xC

-0xC is a concrete negative integer, so we can say that (ESP_init-0x4) is bigger than (ESP_init-0x10). To delete all elements above the stack, just call sb.del_mem_above_stack(ir_arch.sp) at the end of each symbolic execution. Here is the cleaned up result:

-------------------- State --------------------
Regs:
        VM_PC_init = (VM_PC_init+0x3)
Mem:
        @8[(VM_PC_init+0x3)] = ((@8[(VM_PC_init+0x1)]^@8[(VM_PC_init+0x3)]^0x47)&0x7F)
        @32[(ECX_init+0x4)] = (@32[(ECX_init+0x4)]+0x2)
        @16[@32[(ECX_init+0x4)]] = (@16[@32[(ECX_init+0x4)]]^@16[(VM_PC_init+0x1)])
x86:
        EAX = {0x1,0,8, @16[(VM_PC_init+0x1)][8:16],8,16, (VM_PC_init+0x1)[16:32],16,32}
        ECX = (VM_PC_init+0x3)
        EDX = {(@8[(VM_PC_init+0x1)]^0x47),0,8, EDX_init[8:32],8,32}

The next step is to add the virtual machine registers information. As far as we understand, they are in memory, next to the VM_PC. Let’s add the information (let’s say we have at least 5 registers)

Here is the result:

-------------------- State --------------------
Regs:
        VM_PC_init = (VM_PC_init+0x3)
        REG0 = (REG0+0x2)
Mem:
        @8[(VM_PC_init+0x3)] = ((@8[(VM_PC_init+0x1)]^@8[(VM_PC_init+0x3)]^0x47)&0x7F)
        @16[REG0] = (@16[REG0]^@16[(VM_PC_init+0x1)])
x86:
        EAX = {0x1,0,8, @16[(VM_PC_init+0x1)][8:16],8,16, (VM_PC_init+0x1)[16:32],16,32}
        ECX = (VM_PC_init+0x3)
        EDX = {(@8[(VM_PC_init+0x1)]^0x47),0,8, EDX_init[8:32],8,32}

It seems that this is an equivalent of XOR word ptr [REG0], imm16 in x86. The mnemonic is three bytes long. The key to decrypt the next one is 0x47. Note the REG0 = (REG0+0x2): this can be interpreted as a post increment of the memory base register (here, REG0). So the instruction can be better represented as:

XOR word ptr [REG0]++, imm16

In most classic architectures (like ARM for example), the post increment is the size of the memory access.

Let’s run the fifth mnemonic (without displaying x86 registers):

-------------------- State --------------------
Regs:
        VM_PC_init = (VM_PC_init+0x5)
        REG0 = (REG0+0x4)
Mem:
        @8[(VM_PC_init+0x5)] = ((@8[(VM_PC_init+0x1)]^@8[(VM_PC_init+0x5)]^0xC0)&0x7F)
        @32[REG0] = (@32[REG0]^@32[(VM_PC_init+0x1)])

In this case: XOR dword ptr [REG0]++, imm32, five bytes long, with key 0xC0. Here is another one (we won’t display the next mnemonic deciphering for the sake of clarity):

REG0 = (REG0+0x1)
@8[REG0] = (@8[REG0]+@8[(VM_PC_init+0x1)])

We have ADD byte ptr [REG0]++, imm8

Note that we can add the mnemonics immediates to clarify a bit more the output:

# imm
expr_imm8 = expr_simp(ExprMem(vm_pc_init + ExprInt32(0x1), 8))
infos[expr_imm8] = ExprId("imm8" , 8)

expr_imm16 = expr_simp(ExprMem(vm_pc_init + ExprInt32(0x1), 16))
infos[expr_imm16] = ExprId("imm16" , 16)

expr_imm32 = expr_simp(ExprMem(vm_pc_init + ExprInt32(0x1), 32))
infos[expr_imm32] = ExprId("imm32" , 32)

New output:

REG0 = (REG0+0x1)
@8[REG0] = (imm8+@8[REG0])

Results

REG0 = (REG0+0x2)
@16[REG0] = (imm16+@16[REG0])

Mnemonic: ADD word ptr [REG0]++, imm16

REG0 = (REG0+0x4)
@32[REG0] = (imm32+@32[REG0])

Mnemonic: ADD dword ptr [REG0]++, imm32

REG0 = (REG0+0x1)
@8[REG0] = (@8[REG0]+(- imm8))

Mnemonic: SUB byte ptr [REG0]++, imm8

REG0 = (REG0+0x2)
@16[REG0] = (@16[REG0]+(- imm16))

Mnemonic: SUB word ptr [REG0]++, imm16

REG0 = (REG0+0x4)
@32[REG0] = (@32[REG0]+(- imm32))

Mnemonic: SUB dword ptr [REG0]++, imm32

The next one fails:

Final destination: ((@8[(VM_PC_init+0x1)]&0x7)?(lbl_gen_00000000:None,0x40EFE3))
Traceback (most recent call last):
  File "emul_instr_02.py", line 142, in <module>
    raise ValueError("Unknown destination %s" % ad)
ValueError: Unknown destination ((@8[(VM_PC_init+0x1)]&0x7)?(lbl_gen_00000000:None,0x40EFE3))

The reason seems quite simple: the symbolic execution engine didn’t resolve a condition. But strangely, the condition doesn’t appear in the execution flow:

CFG of the faulty mnemonic

In fact, if we look at the possible destinations, there is a generated label, which means a native mnemonic has generated a jump when translated to intermediate representation. Here it is:

../../../_images/zeusvm_mnemo_013_ir.svg

IR CFG of the faulty mnemonic

The mnemonic responsible for this behaviour is ROL. On x86, the rotation has a different behaviour depending on whether the counter is null or not. In Miasm IR, it generates an IF/THEN (without ELSE) condition. To convince yourself, you can assemble a ROL EAX, CL (which gives d3c0). You can use the disassembly example of Miasm:

python miasm/example/disasm/full.py -m x86_32  file_to_disasm -g

The -g option will generate the intermediate representation of the disassembled assembly and output it to the file graph_irflow.dot. Here it is:

IR CFG of the ROL mnemonic

Note that we could have implemented the ROL EAX, CL without the test, but it has a drawback: in this case, the IR will always say that the register EAX is written by the instruction, even if CL is zero (which is not the case). So we lose some precision in the intermediate representation. But this can be easily modified in the ROL description, in the file miasm2/arch/x86/sem.py. Looking at the current implementation, one may notice that if the mnemonic uses a non zero constant counter (like in ROL EAX, 8), Miasm will not generate the IF/THEN block (the counter can never be zero).

In this case, we should deal with both cases in our symbolic execution. To simplify the problem, we will assume that the value of the counter is never zero. This means that in this case @8[(VM_PC_init+0x1)]&0x7 is always different from zero. The mnemonic symbolic execution now terminates, and here is the result:

REG0 = (REG0+0x1)
@8[REG0] = (@8[REG0] <<< (imm8&0x7))

Mnemonic: ROL byte ptr [REG0]++, imm8

The same error is raised on the next mnemonic:

Final destination: (({(@8[(VM_PC_init+0x1)]&0xF),0,8, 0x0,8,16}&0x1F)?(lbl_gen_00000000:None,0x40F01C))

After injecting the correct information:

REG0 = (REG0+0x2)
@16[REG0] = (@16[REG0] <<< ({(imm8&0xF),0,8, 0x0,8,16}&0x1F))

Mnemonic: ROL word ptr [REG0]++, imm8

Same trick with 32 bit extension:

REG0 = (REG0+0x4)
@32[REG0] = (@32[REG0] <<< ({(imm8&0x1F),0,8, 0x0,8,32}&0x1F))

Mnemonic: ROL dword ptr [REG0]++, imm8

Then:

REG0 = (REG0+0x1)
@8[REG0] = (@8[REG0] >>> (imm8&0x7))

Mnemonic: ROR byte ptr [REG0]++, imm8

REG0 = (REG0+0x2)
@16[REG0] = (@16[REG0] >>> ({(imm8&0xF),0,8, 0x0,8,16}&0x1F))

Mnemonic: ROR word ptr [REG0]++, imm8

REG0 = (REG0+0x4)
@32[REG0] = (@32[REG0] >>> ({(imm8&0x1F),0,8, 0x0,8,32}&0x1F))

Mnemonic: ROR dword ptr [REG0]++, imm8

REG0 = (REG0+0x1)
@8[REG0] = (@8[REG0]^0xFF)

Here, as all bits of the low byte of @8[REG0], this can be interpreted as NOT byte ptr [REG0]++

REG0 = (REG0+0x2)
@16[REG0] = (@16[REG0]^0xFFFF)

Mnemonic: NOT word ptr [REG0]++

REG0 = (REG0+0x4)
@32[REG0] = (@32[REG0]^0xFFFFFFFF)

Mnemonic: NOT dword ptr [REG0]++

The next one is interesting as it executes a loop (which can be symbolically executed in this case). Here is the result:

REG0 = (REG0+0x4)
@8[(REG0+{(imm8&0x3),0,8, 0x0,8,32})] = @8[REG0]
@8[(REG0+{((imm8 >> 0x4)&0x3),0,8, 0x0,8,32})] = @32[REG0][16:24]
@8[(REG0+{((imm8 >> 0x2)&0x3),0,8, 0x0,8,32})] = @32[REG0][8:16]
@8[(REG0+{((imm8 >> 0x6)&0x3),0,8, 0x0,8,32})] = @32[REG0][24:32]

Ok, this is a nice one: we can rewrite the access @32[REG0][8:16] as @8[REG0+1] (we make the hypothesis the code won’t segfault in this case, and that we are working in little endian). By the way, this simplification can be implemented through the Miasm simplification engine.

REG0 = (REG0+0x4)
@8[(REG0+{(imm8&0x3),0,8, 0x0,8,32})] = @8[REG0]
@8[(REG0+{((imm8 >> 0x2)&0x3),0,8, 0x0,8,32})] = @8[REG0+1]
@8[(REG0+{((imm8 >> 0x4)&0x3),0,8, 0x0,8,32})] = @8[REG0+2]
@8[(REG0+{((imm8 >> 0x6)&0x3),0,8, 0x0,8,32})] = @8[REG0+3]

This mnemonic can be seen as a twisted version of the x86 SWAP instruction, which has the resulting bytes order fixed using an immediate argument. So this can be written like: PERMUT dword ptr [REG0]++, imm8

The next one (0x409281) is an interesting mnemonic as well: the execution loops 0x100 times, and fails on an unresolved branch:

Condition in destination: (((- {@8[(VM_PC_init+0x1)],0,8, 0x0,8,16})+0x1)?(0x4092D9,0x4092D5))

Here it is:

CFG of the strange mnemonic

This one is a bit too complex for the symbolic execution, so in this case the ultimate solution is to look at it by hand. This mnemonic takes a key, a memory pointer and a length and decrypts it using RC4.

Mnemonic: DEC_RC4 ARG0, ARG1, length

Next one:

REG1 = {imm8,0,8, 0x0,8,32}

Mnemonic: MOVZX REG1, imm8

REG1 = {imm16,0,16, 0x0,16,32}

Mnemonic: MOVZX REG1, imm16

REG1 = imm32

Mnemonic: MOV REG1, imm32

REG0 = (REG0+{imm16,0,16, (imm16[15:16]?(0xFFFF,0x0)),16,32})

Mnemonic: MOVSX REG0, imm16

The next one (0x40F27F) fails:

CFG of the faulty mnemonic

Fun with symbolic execution

We could analyze it by hand again, but for the exercise we will try another method. As you can see the mnemonic is relatively simple and is composed of two IF/THEN blocks. We will use the symbolic execution engine to execute every possible path, and give us all the possible results (in this case, four). This method has already been detailed in the previous article. We will reuse its steps (without the SAT solver part) to obtain the mnemonic final states.

Here is the output:

Ret addr reached
-------------------- State --------------------
Regs:
        imm8b = ((imm8^imm8b^0xFF)&0x7F)
        VM_PC_init = (VM_PC_init+0x2)
Mem:
x86:
        EAX = {0x1,0,8, (VM_PC_init+0x2)[8:32],8,32}
        EDX = REG1

----------------------------------------
Block 0x40F2A9
Ret addr reached
-------------------- State --------------------
Regs:
        VM_PC_init = (VM_PC_init+0x2)
Mem:
x86:
        EAX = {0x1,0,8, (VM_PC_init+0x2)[8:32],8,32}
        EDX = REG1

----------------------------------------
Block 0x40F2A0
Ret addr reached
-------------------- State --------------------
Regs:
        REG1 = (REG1+0xFFFFFFFF)
        imm8b = ((imm8^imm8b^0xFF)&0x7F)
        VM_PC_init = (VM_PC_init+(- {imm8,0,8, 0x0,8,32})+0x2)
Mem:
x86:
        EAX = {0x1,0,8, (VM_PC_init+(- {imm8,0,8, 0x0,8,32})+0x2)[8:32],8,32}
        EDX = {imm8,0,8, 0x0,8,32}

----------------------------------------
Block 0x40F2A0
Ret addr reached
-------------------- State --------------------
Regs:
        REG1 = (REG1+0xFFFFFFFF)
        VM_PC_init = (VM_PC_init+(- {imm8,0,8, 0x0,8,32})+0x2)
Mem:
x86:
        EAX = {0x1,0,8, (VM_PC_init+(- {imm8,0,8, 0x0,8,32})+0x2)[8:32],8,32}
        EDX = {imm8,0,8, 0x0,8,32}

The first two states only increment VM_PC by two (the difference between them is the next mnemonic which is decrypted in the first case, and not in the second one).

The remaining states have the same difference, but the VM_PC is different, as well as REG0:

REG1 = (REG1+0xFFFFFFFF)
VM_PC_init = (VM_PC_init+(- {imm8,0,8, 0x0,8,32})+0x2)

Moreover, if we look at the condition forced by the code:

Final destination: (REG1?(0x40F2A0,0x40F2A9))

Looks familiar? This can be read like this:

decrement REG1 by one;
if the original REG1 was:
- zero: VM_PC += 2
- otherwise: substract VM_PC by a signed extended imm8

This is exactly what the x86 mnemonic loop is doing with the ECX register.

So: LOOP REG1, label

Fun fact: The mnemonic only decrypts the next mnemonic (in both conditions of the loop). So if the loop is taken, we hope the destination mnemonic has already been executed once in order to be decrypted; if not, the code may crash!

Now, we have a script which can handle more complex mnemonics. This solution has some limitation though, for example in case of complex loops: it may generate too many states.

The next mnemonic is very close to the current one, let’s use the same script:

Ret addr reached
-------------------- State --------------------
Regs:
        REG1 = (REG1+0xFFFFFFFF)
        VM_PC_init = (VM_PC_init+(- {imm16,0,16, 0x0,16,32})+0x3)
Mem:
x86:
        EAX = {0x1,0,8, (VM_PC_init+(- {imm16,0,16, 0x0,16,32})+0x3)[8:32],8,32}
        EDX = {imm16,0,16, 0x0,16,32}

----------------------------------------
Block 0x40F2B7
Condition in destination: (REG1?(0x40F2D1,0x40F2DA))
VM_PC_init
Current VM_PC: (VM_PC_init+0x1)
Final destination: (REG1?(0x40F2D1,0x40F2DA))
----------------------------------------
Block 0x40F2DA
Ret addr reached
-------------------- State --------------------
Regs:
        VM_PC_init = (VM_PC_init+0x3)
Mem:
x86:
        EAX = {0x1,0,8, (VM_PC_init+0x3)[8:32],8,32}
        EDX = REG1

----------------------------------------
Block 0x40F2D1
Ret addr reached
-------------------- State --------------------
Regs:
        REG1 = (REG1+0xFFFFFFFF)
        VM_PC_init = (VM_PC_init+(- {imm16,0,16, 0x0,16,32})+0x3)
Mem:
        @8[(VM_PC_init+0x3)] = ((imm8^@8[(VM_PC_init+0x3)]^0xFB)&0x7F)
x86:
        EAX = {0x1,0,8, (VM_PC_init+(- {imm16,0,16, 0x0,16,32})+0x3)[8:32],8,32}
        EDX = {imm16,0,16, 0x0,16,32}

----------------------------------------
Block 0x40F2DA
Ret addr reached
-------------------- State --------------------
Regs:
        VM_PC_init = (VM_PC_init+0x3)
Mem:
        @8[(VM_PC_init+0x3)] = ((imm8^@8[(VM_PC_init+0x3)]^0xFB)&0x7F)
x86:
        EAX = {0x1,0,8, (VM_PC_init+0x3)[8:32],8,32}
        EDX = REG1

The mnemonic is exatly the same except the fact that it’s three bytes long, and the label is computed using signed extended imm16.

Mnemonic: LOOP REG1, label

The next one:

@32[(ECX_init+(({imm8,0,8, 0x0,8,32}&0xF)*0x4)+0xC)] = {@8[(VM_PC_init+0x2)],0,8, 0x0,8,32}

In this case, we have an uncommon form: we write at @32[ECX_init + XXXX * 4 + 0xC], with XXXX between 0x0 and 0xF (as it’s masked using 0xF) . From our previous analysis, we decided that:

@32[ECX_init + 0x0] is VM_PC
@32[ECX_init + 0x4] is REG0
@32[ECX_init + 0x8] is REG1

So ECX_init + XXXX * 4 + 0xC is just after those registers. Let’s inform the engine that @32[ECX_init + XXXX * 4 + 0xC] is REGX.

#(ECX_init+(({imm8,0,8, 0x0,8,32}&0xF)*0x4)+0xC)
base_regx = expr_simp(regs.ECX_init + (imm8.zeroExtend(32) & ExprInt32(0xF)) * ExprInt32(4) + ExprInt32(0xC))
infos[expr_simp(ExprMem(base_regx, 32))] = ExprId("REGX" , 32)

Here is the result:

REGX = {@8[(VM_PC_init+0x2)],0,8, 0x0,8,32}

Note that in this case, the first immediate is used to inform the register index, so the immediate argument is @8[(VM_PC_init+0x2)] instead of @8[(VM_PC_init+0x1)]. In this case, we will create another alias, called imm8b/imm16b/imm32b.

# immb
expr_imm8 = expr_simp(ExprMem(vm_pc_init + ExprInt32(0x2), 8))
infos[expr_imm8] = ExprId("imm8b" , 8)

expr_imm16 = expr_simp(ExprMem(vm_pc_init + ExprInt32(0x2), 16))
infos[expr_imm16] = ExprId("imm16b" , 16)

expr_imm32 = expr_simp(ExprMem(vm_pc_init + ExprInt32(0x2), 32))
infos[expr_imm32] = ExprId("imm32b" , 32)

Result:

REGX = {imm8b,0,8, 0x0,8,32}

Mnemonic: MOVZX REGX, imm8

Next one:

REGX = {imm16b,0,16, 0x0,16,32}

Mnemonic: MOVZX REGX, imm16

REGX = @32[(VM_PC_init+0x2)]

Mnemonic: MOV REGX, imm32

REGX = {@8[(ECX_init+({imm8[4:8],0,4, 0x0,4,32}*0x4)+0xC)],0,8, 0x0,8,32}

Ok, this time this is exactly the same registers accesses, but instead of using the low bits of the instruction immediate, it uses the high part as read register index. Let’s add the information:

base_regx = expr_simp(regs.ECX_init + (imm8[4:8].zeroExtend(32)) * ExprInt32(4) + ExprInt32(0xC))
infos[expr_simp(ExprMem(base_regx, 32))] = ExprId("REGY" , 32)

Result:

REGX = {REGY[0:8],0,8, 0x0,8,32}

So mnemonic: MOVZX REGX, REGY.u08

Next one:

REGX = {REGY[0:16],0,16, 0x0,16,32}

Mnemonic: MOVZX REGX, REGY.u16

REGX = REGY

Mnemonic: MOV REGX, REGY

REGX = (REGX+{REGY[0:8],0,8, 0x0,8,32})

Mnemonic: ADD REGX, REGY.u08

REGX = (REGX+{REGY[0:16],0,16, 0x0,16,32})

Mnemonic: ADD REGX, REGY.u16

REGX = (REGX+REGY)

Mnemonic: ADD REGX, REGY

REGX = (REGX+(- {REGY[0:8],0,8, 0x0,8,32}))

Mnemonic: SUB REGX, REGY.u08

REGX = (REGX+(- {REGY[0:16],0,16, 0x0,16,32}))

Mnemonic: SUB REGX, REGY.u16

REGX = (REGX+(- REGY))

Mnemonic: SUB REGX, REGY

REGX = (REGX^{REGY[0:8],0,8, 0x0,8,32})

Mnemonic: XOR REGX, REGY.u08

REGX = (REGX^{REGY[0:16],0,16, 0x0,16,32})

Mnemonic: XOR REGX, REGY.u16

REGX = (REGX^REGY)

Mnemonic: XOR REGX, REGY

REGX = (REGX+{imm8b,0,8, 0x0,8,32})

Mnemonic: ADD REGX, imm8

REGX = (REGX+{imm16b,0,16, 0x0,16,32})

Mnemonic: ADD REGX, imm16

REGX = (REGX+imm32b)

Mnemonic: ADD REGX, imm32

REGX = (REGX+(- {imm8b,0,8, 0x0,8,32}))

Mnemonic: SUB REGX, imm08

REGX = (REGX+(- {imm16b,0,16, 0x0,16,32}))

Mnemonic: SUB REGX, imm16

REGX = (REGX+(- imm32b))

Mnemonic: SUB REGX, imm32

REGX = (REGX^{imm8b,0,8, 0x0,8,32})

Mnemonic: XOR REGX, imm08

REGX = (REGX^{imm16b,0,16, 0x0,16,32})

Mnemonic: XOR REGX, imm16

REGX = (REGX^imm32b)

Mnemonic: XOR REGX, imm32

REG0 = (REG0+0x1)
@8[REG0] = (@8[REG0]+REGX[0:8])

Mnemonic: ADD byte ptr [REG0]++, REGX.u08

REG0 = (REG0+0x2)
@16[REG0] = (@16[REG0]+REGX[0:16])

Mnemonic: ADD word ptr [REG0]++, REGX.u16

REG0 = (REG0+0x4)
@32[REG0] = (REGX+@32[REG0])

Mnemonic: ADD dword ptr [REG0]++, REGX

REG0 = (REG0+0x1)
@8[REG0] = (@8[REG0]+(- REGX[0:8]))

Mnemonic: SUB byte ptr [REG0]++, REGX.u08

REG0 = (REG0+0x2)
@16[REG0] = (@16[REG0]+(- REGX[0:16]))

Mnemonic: SUB word ptr [REG0]++, REGX.u16

REG0 = (REG0+0x4)
@32[REG0] = (@32[REG0]+(- REGX))

Mnemonic: SUB dword ptr [REG0]++, REGX

REG0 = (REG0+0x1)
@8[REG0] = (@8[REG0]^REGX[0:8])

Mnemonic: XOR byte ptr [REG0]++, REGX.u08

REG0 = (REG0+0x2)
@16[REG0] = (@16[REG0]^REGX[0:16])

Mnemonic: XOR word ptr [REG0]++, REGX.u16

REG0 = (REG0+0x4)
@32[REG0] = (REGX^@32[REG0])

Mnemonic: XOR dword ptr [REG0]++, REGX

REGX = {@8[REG0],0,8, 0x0,8,32}

Mnemonic: MOVZX REGX, byte ptr [REG0]

REGX = {@16[REG0],0,16, 0x0,16,32}

Mnemonic: MOVZX REGX, word ptr [REG0]

REGX = @32[REG0]

Mnemonic: MOV REGX, dword ptr [REG0]

REG0 = (REG0+0x1)
@8[REG0] = REGX[0:8]

Mnemonic: MOV byte ptr [REG0]++, REGX.u08

REG0 = (REG0+0x2)
@16[REG0] = REGX[0:16]

Mnemonic: MOV word ptr [REG0]++, REGX.u16

REG0 = (REG0+0x4)
@32[REG0] = REGX

Mnemonic: MOV dword ptr [REG0]++, REGX

And the last one:

Block 0x40EE37
-------------------- State --------------------
Regs:
Mem:
x86:
        EAX = {0x0,0,8, EAX_init[8:32],8,32}

Ret addr reached

As AL is null, the mnemonic stops the VM!

Final words

After a quick manual analysis, we built a script which gives the body of each mnemonic of the virtual machine. The script evaluates all the 69 mnemonics in about 5 seconds.

If we encounter this same VM, or a similar one, we will just have to re-run our scripts to match known “mnemonic semantic”. As we use symbolic execution, given the right simplification passes, we should be able to get through some additionnal obfuscation.

What are the next steps? Depending on our needs, we could add a new architecture to disassemble the VM virtual code. We could even implement the semantic to JiT this code, debug it, run static analysis algorithms, SMT solver, …

Did I mention that symbolic execution can be fully parallelized ?

Final script: zeus_get_ir.py

Posted by serpilliere

Tags: disasm, emulation, IR