learn reversing by solving MalwareTech vm using cutter

If you are deep into reverse engineering you most probably already know the challenges from MalwareTech in his blog here . This solution pertains to the Vm1 which is supposed to be the hardest among the lot.

The challenge description is as follows:

Before the analysis of binary you must familiarize yourself with the vms and why they are important for malwares. You can read up more info on vms in the links in reference section.

Now for starting the analysis we first load the file vm1.exe into cutter. and check the functions table.

looking at this table we can see that there are two custom functions implemented ie fcn.00402270 and fcn.004022e0 but as is customary we will check the entry0 function first.

first three instruction are setting up the stack frame and then it starts by calling sym.vm1_easy.exe___0MD5__QAE_XZ now at this point if you have been reversing for a while you know it is the cpp mangled symbol name and it is calling the constructor function and passing it the pointer to var90h which is most possibly a pointer for md5 string data.

The program then pushes 0 to the stack and then 0x1fb which is decimal 507 to the stack and then it calls GetProcessHeap but looking for it in the msdn we find than GetProcessHeap does not take any argument and returns the handle to the current processes heap which is pushed into the stack in very next instruction so its safe to assume that the prior pushes and this one all occur for another function called in the next function which is HeapAlloc so the function call is as

HeapAlloc(GetProcessHeap(),507,0);

so looking up in msdn again this functions allocates 507 bytes in the heap of current function and returns the reference to it into 0x40423c which is probably a global variable since it is not in stack of current function.

next it pushes the values 507 ,0x404040, and the reference to newly allocated heap area into stack and calls memcpy which is equivalent to

memcpy(new_heap_space,0x404040,507);

It is again safe to assume that the 0x404040 is some address at which there is some data of 507 which the process wants to copy to the newly allocated heap space. in the very next instruction it clears up the 12 bytes of stack memory used during function call

next it calls function fcn.004022e0 but its return value is not being used hence it can be assumed that this function does not take any parameter and does not return any value either. so it is safe to assume that this function probably makes some changes to the program state ie global variables or memory .

next the pointer to the newly allocated space is pushed into stack and digeststring is called with the pointer to the md5 stringdata in ecx which is the standard this call convention and the returned value is saved into local variable LPCAPTION and is equivalent to

LPCAPTION = VAR_90H.digestString(new_heap_space);

next it pushes 0x30 , pointer to a string that says “we have been compromised” , then the md5 digested string and 0 to stack and calls the MessageBoxA function looking up on msdn we can see that the 0x30 is flag called MB_ICONEXCLAMATION hence the call must have been as

MessageBoxA(0,LCAPTION,"we have been compromised",MB_ICONEXCLAMATION);

next the program pushes 0 to the stack and called ExitProcess so exiting the process with 0 return value the function then enters into functions epilogue where eax is zeroed out and the previous stack frame is restored .

with this information we can assume the c++ equivalent of this function as

int entry0()
{
MD5 VAR_90H;
new_heap_space = HeapAlloc(GetProcessHeap(),507,0);
memcpy(new_heap_space,0x404040,507);
fcn.004022e0();
LPCAPTION = VAR_90H.digestString(new_heap_space);
MessageBoxA(0,LCAPTION,"we have been compromised",MB_ICONEXCLAMATION);
ExitProcess(0);
}

Till this point we know the flag is the string whose md5 checksum is calculated and shown at the message box. now the job that remains is to find this string in memory but this memory was allocated at runtime so it is much possible that the flag was setup at this position at runtime. since we have examined all instruction except the call to fcn.004022e0 we can assume that it setups the flag. so now we analyze the fcn.004022e0 function.

now this function sets up the stack and then one local variable VAR_1H is initialized with value 0 then value 1 is moved to eax and eax is tested to check if its 0 which is weird but makes sense later on if the eax is 0 it jumps to location 0x404237 which we will see later. next the value of VAR_1H is loaded into ecx register

Now at the very next location is the most important clue that this function indeed makes changes to the newly allocated heap space since its pointer at 0x40423c is loaded into edx register.

In next instruction one byte of data is read from location [edx+ecx+0xff] which means there is some data at 0xff or 255 th byte of the newly allocated heap space which is being read from there and stored in variable VAR_10H in next instruction. next set of instruction move the variable VAR_1H into cl and then cl is incremented by 1 and moved back to the variable which is equivalent to increasing VAR_1H by one . next the similar set of instruction is repeated and VAR_CH is loaded with a byte at 255+1 th offset of the heap space similarly again same set of instruction is used to load the byte at 255+2 th offset of the heap space into VAR_8H

and then again the VAR_1H is incremented by 1 and then the value in the VAR_8H,VAR_CH, and VAR_10H is pushed into the stack and the function fcn.00402270 is called and its return byte is moved into ecx and checked if it is zero .if not the jne makes it jump to another jump instruction which jumps back to the top of the function at 0x4022ea else the jump does not take place and next jump to 0x402367 takes place thus breaking the loop and reaching the function suffix.

if we look at what this function does we can see that it reads 3 consecutive bytes from the newly allocated heap space from offset 255 and then passes it to the function fcn.00402270 till this function returns 0 while reading each byte the value of VAR_1H is increased which suggests that it is some 8 bit virtual machine and VAR_1H is its program counter and it is carrying out the fetching of the instructions.

next we must look at the function fcn.00402270 to understand what does the function do with the bytes sent to it as argument.

in this function the first argument is first loaded into variable VAR_4H and then it is compared against 1 ,2 and 3 and jumps are taken if they happen to be equal . This is standard switch case statement in c.

if the first argument is equal to 1 jump is taken to 0x40228e where the code writes one byte value of third argument at offset equal to second argument and then jumps to the end of the function where it returns 1. similarly if the first argument is equal to 2 the program jumps to 0x40229e where the code read one byte from offset equal to second argument and stores it in the memory location 0x404240 and jumps to end and returns 1 .

and if the first parameter was equal to 3 the function jumps to 0x4022b0 where it loads the value from location 0x404240 into edx and the byte at offset equal to second argument to ecx and xors them and stores it back to the same location then jumps to the end and returns 1 ; if the first argument is not 1 2 or 3 it jumps to the point where the function returns 0;

now the real working of the vm is pretty clear the first argument is the opcode and the next two are the arguments and the
opcode 1 = move immediate to memory
opcode 2 = move memory to register
opcode 3 = xor memory with register

now we know how the vm is implemented and we are also provided the ram.bin file which is the ram of the vm ie the content of newly allocated heap space . so loading that file into cutter and using the hexdump view we can see that there is some instruction starting at byte 255 in the file. we can go to the parsing panel on the side and choose python as parse option and then select all the 507 bytes from the file it gives us the bytes in python list which we can then copy into our python script and implement our vm in python.

the final vm i wrote in python was :

reg = 0
data = # list from the cutter copy it and paste here.
def execute(opcode, b, c):
global reg
if (a == 0x1):
data[b] = c
else:
if (a == 0x02):
reg = data[b]
else:
if (a != 0x03):
return 0
data[b] = data[b] ^ reg
return 1
def fetch_and_execute():
counter = 0
run = 1
while (run):
run = execute(data[255+counter], data[256+counter], data[257+counter])
counter = counter+3
def print_text():
print("".join(map(chr, data[:26])))

fetch_and_execute()
print_text()

this python code is equivalent to what the vm does but not in exactly the same order .and the print_text() function prints first 26 bytes as memory which seems enough but if not it can be adjusted.

now executing the vm we can obtain the flag for the challenge . FLAG{VMS-ARE-FOR-MALWARE}

References and links

4 thoughts on “learn reversing by solving MalwareTech vm using cutter

Leave a comment