GrepPINg through binaries

tl;dr: Oh, no! Antidebugging! Fuck it, we will use DBI.

Intro

So let’s say that you find a marketing stunt on the net, coming from a security academy online. It consists of a binary file consuming a “license file”. Your mission, should you decide to accept it is to craft that license file and get the “good boy” message.
Problem: almost every antidebugging/obfuscation/annoyance has been used in the binary in order to avoid its analysis.
We could now of course spend countless hours trying to defeat these antidebugging mechanisms and deobfuscate the assembly. But let’s be honest here, who in his right mind likes to deobfuscate binaries? (answer: Jurriaan Bremer a.k.a. @skier_t does)

Another approach would be to avoid the whole drama at once and use a combination of static reversing and dynamic binary instrumentation (DBI). To be more precise we will use Intel’s PIN.

First stop, IDA

We start of course by opening the file in IDA Pro for a first contact with the application.
It is apparently a simple C++ application. IDA does a good job of identifying and renaming most of the 500 functions with their standard C++ mangled name. There are a bunch of unidentified (sub_xxx) functions, those will implement the actual program logic and we will have to concentrate on those.

Functions

Since we can not use a debugger it would be at least useful to know which functions were executed and in which order, that is, to trace the binary execution. For that purpose we will use a slightly modified version of one of the PIN examples (named “PinTool_tracer.dll”, you can find and old version attached at the end of this post)

By using a bit of IDAPython it is easy to export a dictionary of function addresses and its corresponding names and import them to our trace file, ending with something like this:

WithFuncInfo

Is it just me or you see a pattern as well? What is this? (More on that later)

Of course we haven’t solved the challenge already, since we just improvised a “license file” with random contents but somewhere, the function that checks if the file contents are right had to be executed.

There are several ways to proceed at this point and I decided to give a shot to a tiny script. It is an IDAPython PoC showing how to “identify” cryptographic functions, just by counting the amount of “strange” instructions. Functions with a high percentage of these instructions (like xor, neg, shl, shr, etc.) will be shown for further analysis. It is a long shot but it takes five minutes.

This is the output:

[*] Analysing...

[!] Potential Crypto: sub_401450: 11.764706 %
[!] Potential Crypto: ___security_init_cookie: 11.111111 %
[!] Potential Crypto: __SEH_prolog4: 9.523810 %
[!] Potential Crypto: _findenv: 8.823529 %
[...]

The function

Not bad, most of the functions are somehow standard and created by the compiler but the first one on the list is a custom one… let’s check it out :)

This function is doing the following:

  • Getting a handle to the file using CreateFileA
  • Checks its size using GetFileSize (it must be 16 bytes)
  • Reads the file contents to memory using ReadFile [1]
  • Does some arithmetic with its 4 dwords and compares the result to a constant value

And this is exactly the interesting part.  Depending on the results of this operation the function returns either 1 or 0. Sounds like boolean to me, so I put my money on the branch returning 1. (To be sure I patched the binary to return always true and verified that was it)

Let’s focus on the license check, here is the code:

xor_stuff:
.text:004014AF mov ecx, [esi+0Ch]
.text:004014B2 xor ecx, [esi+8]
.text:004014B5 push esi
.text:004014B6 xor ecx, [esi+4]
.text:004014B9 xor ecx, [esi]
.text:004014BB xor ecx, xor_key_0
.text:004014C1 xor ecx, xor_key_1
.text:004014C7 xor ecx, xor_key_2
.text:004014CD xor ecx, xor_key_3
.text:004014D3 xor ecx, dw_unk_41B2C4
.text:004014D9 cmp ecx, 4C833425h
.text:004014DF jnz short wrong_stuff

This is very simple, if the file contains four dwords (32bit values) A, B, C, D, then the check can be expressed as:

D ^ C ^ B ^ A ^ xor_key_0 ^ xor_key_1 ^ xor_key_2 ^ xor_key_3 ^ dw_unk_41B2C4 == 0x4C833425

The variables xor_key_N were initialized at compile time, so their (initial) values can be seen in the binary. Anyway these are not the values at the moment this function is executed since there is (at least) another function modifying these exact variables. Guess which one?

Modify_xor_vars

Yes, it’s the function we saw executing a bazillion times on our trace. I haven’t checked it but my guess is that the program raises exceptions in order to execute code “hidden” in previously defined exception handlers and alike.

About dw_unk_41B2C4, this variable wasn’t initialized and therefore we had to get its value at realtime. Yes, another small PinTool tracking memory writes, since we know the address where is going to be located (0x41B2C4).

Game of registers

So you see, it is pretty messy. We have to get the value of five different variables at exactly the time this calculation is executed… Do we?

Not really. Since the XOR is such a simple operation we just need the result of XORing all these variables. This way I can write a PinTool that reads only one value at a certain execution point, namely the value of the register ECX at the point where the comparison is done. If you are interested in knowing how this is done, go to the PIN manual online here and search for “PIN_GetContextReg” for several examples.

For the people who think graphically:

D ^ C ^ B ^ A ^ xor_key_0 ^ xor_key_1 ^ xor_key_2 ^ xor_key_3 ^ dw_unk_41B2C4 =

D ^ C ^ B ^ A ^ (xor_key_0 ^ xor_key_1 ^ xor_key_2 ^ xor_key_3 ^ dw_unk_41B2C4) =

D ^ C ^ B ^ A ^ monster_xor = value_read_by_pin_tool

If I craft a file containing only zero bytes then I’ll be able to read that combined xor value, named “monster_xor” :)

0x00 ^ monster_xor = monster_xor = value_read_by_pin_tool = 0xCF6FC79D

(Actually whatever combination of values that result in a zero would do, it could be a file filled “A”s)

Solving the equation

The rest is pretty easy, just get four values that satisfy the following discrete equation:

D ^ C ^ B ^ A ^ 0xCF6FC79D = 0x4C833425

Although this is rather straightforward I couldn’t resist the temptation of using Z3 (Python bindings, yay!)

This is the code that solves the equation for A, B, C, D:

# solution must be a file containing 16 bytes (4 dwords: a, b, c, d)
# d^c^b^a^key_0^key_1^key_2^key_3^dw_unk == 0x4C833425
import z3
import struct

# Since the key_N are constants, we can simplify this in advance ;)
monster_xor = 0xcf6fc79d
solution = 0x4C833425

s = z3.Solver()

a = z3.BitVec('a', 32)
b = z3.BitVec('b', 32)
c = z3.BitVec('c', 32)
d = z3.BitVec('d', 32)

calc = d ^ c ^ b ^ a ^ monster_xor

s.add(calc == solution)

if s.check() == z3.sat:
    m = s.model()
    for d in m.decls():
        print "%s: %s" % (d.name(), m[d])

The solution (one of them) is:

a = b = 0
c = 2147483648 # 0x80000000
d = 65860536     # 0x3ECF3B8

Now we just need two lines of python to generate the binary license file:

with open("eLexxxxxity.dat", 'wb') as f:
    f.write(struct.pack("<IIII", a, b, c, d))

Check that everything has been written correctly…

eLearnSecurity

… and success ;)

Success

[1] I could have started looking for ReadFile and tracing forward the operations on the read contents… but I wasn’t expecting the ReadFile and the complete logic of the challenge to be all in one place! ;)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s