Beginner RE and Cryptanalysis with cutter

This time Around we will be solving the MalwareTech’s Ransomware Challenge it is one of the easiest challenge but however it will be an exercise on reverse engineering and cryptanalysis .We will be using opensource tools cutter as far as possible for reverse engineering. I have previously done a detailed tutorial on solving the VM challenge by MalwareTech check it out HERE.

so the challenge description for this challenge is :

Seems like we have a windows ransomware sample on our hand but written by skid?? we will see why the author says that further down the line on cryptanalysis section.

Reverse Engineering the binary.

now lets start our analysis by loading the binary in cutter and doing the normal analysis and check out the functions listings.

we can see here that there are only two actual functions and two are imports from ntdll. now as always it is always good idea to start from entry0 if no function is labeled as main. lets check the assembly listing of the entry0 function.

so looking at the assembly listing we can see in first two instructions it is setting up the stackframe the next instruction does not subtract anything from esp that means no space is reserved on the stack for local variables that means that this function does not use any local variables. The next 3 instructions are interesting it is pushing the value 0 twice and then calling section..text ??
why would the section be called?? actually its not the section being called but a function whose start coincides with the start of the section.we can double click on the symbol section..text to verify that . it indeed takes us to the function fcn.00401000 . These three lines in reality just called function fcn.00401000 as fcn.00401000(0,0);
and then in very next instruction it is adding 8 to esp, this tells us that the caller is clearing up argument from the stack which indicates the calling convention used here is cdecl. In next two instructions it is again pushing 0 and calling the ExitProcess which just exits the program returning 0. The next instruction pops ebp back from the stack to restore the previous stack frame and then the ret is executed to return. however these last two instructions are dead code since the execution never reaches here.
So the entry 0 function can be decompiled as:

entry0()
{
    fcn.00401000(0,0);
    ExitProcess(0);
}

since we already know what exit process does the only piece of puzzle remaining to be solved is the function fcn.00401000 lets jump into it and take a look at its assembly listing.

The first two instructions as usual set up the stack frame and the next two instruction is used to call chkstk with the value 4380(0x111c) . checking msdn documentation for chkstk we see it is used by the compiler to save a stack size greater than 4KB. now in next instruction it moves the pointer to the filename into eax and pushes it . it then pushes the pointer to the format string and then pushes the value 260(0x104) and also pushes the pointer var_108h into the stack and calls the function snprintf which is used to make a string according to the format string and instead of displaying it like printf it saves it into the var_108 buffer. so this whole code takes the filename and prepares a new filename by appending “_encrypted” to the end of it.The next string clears the 16 bytes from stack according to cdecl calling convention.

The code then goes on to push a few constants (flags) into the stacks and also pushes the name of the original file and call create file which creates a handle to the file with given filename/path. and again the same is repeated for the file with the new name which was prepared earlier.so hObject is handle to the old file and var_114h is the handle to the new file.

Then it goes on to push some constants and pointer to buffer lpbuffer and calls the Readfile on original file with handle h0bject which then returns the no of bytes read into the lpNumberOfBytesWritten variable and the file content to lpBuffer.The value 4096(0x1000) is used to set the limit of data read to 4kb.for more info check documentation for ReadFile. The code than compares the return code with 1(boolean TRUE) to check if the read was successful if it wasnot it jumps to 0x401131 which will be discussed later.

it then goes on to load 0 into the counter var_11ch and then jumps to location which compares the counter with the total number of bytes read from the file and if the counter is greater or equal it breaks out of the loop .

this is typical for loop

for(int var_111ch=0;var_111ch<lpNumberOfBytesWritten;var_111ch++)

The code then loads the value of counter into eax and zeros out the edx and loads 32(0x20) into ecx and calls for divide that divides edx:eax and then the remainder in edx is added with the base address in argc which is then used retrieve a key byte. this is equivalent to

argc[var_111ch%32]

it then xors this key with the byte indexed by the count thus encrypting the content , which is equivalent to

data[var_111ch]=data[var111ch]^argc[var_111ch%32]

this is the main encryption logic in the routine the code then loops doing this for all the bytes read from the file and then goes on to push diffrent arguments to the stack and write this encrypted data to the new file by calling the WriteFile function.

the code then checks if the total number of bytes read was equal to 4096 bytes ie 4kb if it was it jumps back to read the remaining data in iteration till all data is read thus encrypting all the data in the file.

the code then pushes the handles of opened files and calls CloseHandle to close those file. it then restores the previous stack frame and then returns .

the c++ equivalent of this routine would be

int encrypt(char * filename, char * key)
{
char databuffer[4096];
char newname[206];
int nobytesread;
snprintf(newname,206,"%s_encrypted",filename); 
handle old=CreateFileA(filename,WRITE_OWNER,0,0,OPEN_EXISTING,0x80,0);
handle new=CreateFileA(newname,WRITE_DAC,0,0,CREATE_ALWAYS,0x80,0);
do{
ReadFile(old,data,4096,&nobytesread,NULL);
for(int count=0;count<=nobytesread;++count)
{
data[count]=data[count]^key[count%32];
}
WriteFile(new,data,&nobyteread,&nobytesread,NULL);
}while(nobytesread==4096);
CloseHandle(new);
CloseHandle(old);
return;
}

at this point we know all about how the given binary works but we still cannot decrypt the files provided because we do not have the key. but from above code we can see that the key index loops back after 32 bytes due to the modular division but we donot have that 32 bytes long key sequence.

Lets do some cryptanalysis to find the key:

Cryptanalysis

looking at the files provided we can see that we have one encrypted text file flag.txt_encrypted and also few encrypted sample pictures.but still we do not have the key but we know that the xor encryption scheme is perfectly secure if the key is as long as the data to be encrypted but in our case it is not . looking at the sample pictures these are the default sample pictures that come with win7 hence we know what the data was before encryption.

so we know both cipher text and plaintext so this encryption can be cryptanalysed using Known PlainText Attack since we know

cyphertext=plaintext^key

which also means that

key=plaintext^ciphertext

so we can write a simple python script to automate above logic and find us the keystring.

Note: if you are not on win7 or do not have the chrysanthemum.jpg you can download from the internet archive here

Also you can use any file here i choose chrysanthemum.

Once we get the key array we can decrypt the files using the same logic as the malware used to encrypt the files. we can automate this by writing simple python script as:

running this script we can successfuly decrypt the flag and the flag turns out to be

FLAG{XOR-MAKES-KNOWN-PLAINTEXT-AND-FREQUENCY-ANALYSIS-EASY}

follow me on twitter at @daring_joker for updates on new tutorials.

References and Links

learn reversing by solving MalwareTech vm using cutter

If you are deep into reverse engineering you most probably already know the challenges from MalwareTech in his blog here . This solution pertains to the Vm1 which is supposed to be the hardest among the lot.

The challenge description is as follows:

Before the analysis of binary you must familiarize yourself with the vms and why they are important for malwares. You can read up more info on vms in the links in reference section.

Now for starting the analysis we first load the file vm1.exe into cutter. and check the functions table.

looking at this table we can see that there are two custom functions implemented ie fcn.00402270 and fcn.004022e0 but as is customary we will check the entry0 function first.

first three instruction are setting up the stack frame and then it starts by calling sym.vm1_easy.exe___0MD5__QAE_XZ now at this point if you have been reversing for a while you know it is the cpp mangled symbol name and it is calling the constructor function and passing it the pointer to var90h which is most possibly a pointer for md5 string data.

The program then pushes 0 to the stack and then 0x1fb which is decimal 507 to the stack and then it calls GetProcessHeap but looking for it in the msdn we find than GetProcessHeap does not take any argument and returns the handle to the current processes heap which is pushed into the stack in very next instruction so its safe to assume that the prior pushes and this one all occur for another function called in the next function which is HeapAlloc so the function call is as

HeapAlloc(GetProcessHeap(),507,0);

so looking up in msdn again this functions allocates 507 bytes in the heap of current function and returns the reference to it into 0x40423c which is probably a global variable since it is not in stack of current function.

next it pushes the values 507 ,0x404040, and the reference to newly allocated heap area into stack and calls memcpy which is equivalent to

memcpy(new_heap_space,0x404040,507);

It is again safe to assume that the 0x404040 is some address at which there is some data of 507 which the process wants to copy to the newly allocated heap space. in the very next instruction it clears up the 12 bytes of stack memory used during function call

next it calls function fcn.004022e0 but its return value is not being used hence it can be assumed that this function does not take any parameter and does not return any value either. so it is safe to assume that this function probably makes some changes to the program state ie global variables or memory .

next the pointer to the newly allocated space is pushed into stack and digeststring is called with the pointer to the md5 stringdata in ecx which is the standard this call convention and the returned value is saved into local variable LPCAPTION and is equivalent to

LPCAPTION = VAR_90H.digestString(new_heap_space);

next it pushes 0x30 , pointer to a string that says “we have been compromised” , then the md5 digested string and 0 to stack and calls the MessageBoxA function looking up on msdn we can see that the 0x30 is flag called MB_ICONEXCLAMATION hence the call must have been as

MessageBoxA(0,LCAPTION,"we have been compromised",MB_ICONEXCLAMATION);

next the program pushes 0 to the stack and called ExitProcess so exiting the process with 0 return value the function then enters into functions epilogue where eax is zeroed out and the previous stack frame is restored .

with this information we can assume the c++ equivalent of this function as

int entry0()
{
MD5 VAR_90H;
new_heap_space = HeapAlloc(GetProcessHeap(),507,0);
memcpy(new_heap_space,0x404040,507);
fcn.004022e0();
LPCAPTION = VAR_90H.digestString(new_heap_space);
MessageBoxA(0,LCAPTION,"we have been compromised",MB_ICONEXCLAMATION);
ExitProcess(0);
}

Till this point we know the flag is the string whose md5 checksum is calculated and shown at the message box. now the job that remains is to find this string in memory but this memory was allocated at runtime so it is much possible that the flag was setup at this position at runtime. since we have examined all instruction except the call to fcn.004022e0 we can assume that it setups the flag. so now we analyze the fcn.004022e0 function.

now this function sets up the stack and then one local variable VAR_1H is initialized with value 0 then value 1 is moved to eax and eax is tested to check if its 0 which is weird but makes sense later on if the eax is 0 it jumps to location 0x404237 which we will see later. next the value of VAR_1H is loaded into ecx register

Now at the very next location is the most important clue that this function indeed makes changes to the newly allocated heap space since its pointer at 0x40423c is loaded into edx register.

In next instruction one byte of data is read from location [edx+ecx+0xff] which means there is some data at 0xff or 255 th byte of the newly allocated heap space which is being read from there and stored in variable VAR_10H in next instruction. next set of instruction move the variable VAR_1H into cl and then cl is incremented by 1 and moved back to the variable which is equivalent to increasing VAR_1H by one . next the similar set of instruction is repeated and VAR_CH is loaded with a byte at 255+1 th offset of the heap space similarly again same set of instruction is used to load the byte at 255+2 th offset of the heap space into VAR_8H

and then again the VAR_1H is incremented by 1 and then the value in the VAR_8H,VAR_CH, and VAR_10H is pushed into the stack and the function fcn.00402270 is called and its return byte is moved into ecx and checked if it is zero .if not the jne makes it jump to another jump instruction which jumps back to the top of the function at 0x4022ea else the jump does not take place and next jump to 0x402367 takes place thus breaking the loop and reaching the function suffix.

if we look at what this function does we can see that it reads 3 consecutive bytes from the newly allocated heap space from offset 255 and then passes it to the function fcn.00402270 till this function returns 0 while reading each byte the value of VAR_1H is increased which suggests that it is some 8 bit virtual machine and VAR_1H is its program counter and it is carrying out the fetching of the instructions.

next we must look at the function fcn.00402270 to understand what does the function do with the bytes sent to it as argument.

in this function the first argument is first loaded into variable VAR_4H and then it is compared against 1 ,2 and 3 and jumps are taken if they happen to be equal . This is standard switch case statement in c.

if the first argument is equal to 1 jump is taken to 0x40228e where the code writes one byte value of third argument at offset equal to second argument and then jumps to the end of the function where it returns 1. similarly if the first argument is equal to 2 the program jumps to 0x40229e where the code read one byte from offset equal to second argument and stores it in the memory location 0x404240 and jumps to end and returns 1 .

and if the first parameter was equal to 3 the function jumps to 0x4022b0 where it loads the value from location 0x404240 into edx and the byte at offset equal to second argument to ecx and xors them and stores it back to the same location then jumps to the end and returns 1 ; if the first argument is not 1 2 or 3 it jumps to the point where the function returns 0;

now the real working of the vm is pretty clear the first argument is the opcode and the next two are the arguments and the
opcode 1 = move immediate to memory
opcode 2 = move memory to register
opcode 3 = xor memory with register

now we know how the vm is implemented and we are also provided the ram.bin file which is the ram of the vm ie the content of newly allocated heap space . so loading that file into cutter and using the hexdump view we can see that there is some instruction starting at byte 255 in the file. we can go to the parsing panel on the side and choose python as parse option and then select all the 507 bytes from the file it gives us the bytes in python list which we can then copy into our python script and implement our vm in python.

the final vm i wrote in python was :

reg = 0
data = # list from the cutter copy it and paste here.
def execute(opcode, b, c):
global reg
if (a == 0x1):
data[b] = c
else:
if (a == 0x02):
reg = data[b]
else:
if (a != 0x03):
return 0
data[b] = data[b] ^ reg
return 1
def fetch_and_execute():
counter = 0
run = 1
while (run):
run = execute(data[255+counter], data[256+counter], data[257+counter])
counter = counter+3
def print_text():
print("".join(map(chr, data[:26])))

fetch_and_execute()
print_text()

this python code is equivalent to what the vm does but not in exactly the same order .and the print_text() function prints first 26 bytes as memory which seems enough but if not it can be adjusted.

now executing the vm we can obtain the flag for the challenge . FLAG{VMS-ARE-FOR-MALWARE}

References and links