A story of binaries and snipers

As part of an ongoing research I’m doing for a project, I’ve been recently playing with INTEL Pin.

“WTF is Pin?” I hear you say… Long story short, Pin is a DBI (Dynamic Binary Instrumentation) framework.

Quoting the guys from uninformed.org:

“Dynamic Binary Instrumentation (DBI) is a method of analyzing the behavior of a binary application at runtime through the injection of instrumentation code. This instrumentation code executes as part of the normal instruction stream after being injected. In most cases, the instrumentation code will be entirely transparent to the application that it’s been injected to. Analyzing an application at runtime makes it possible to gain insight into the behavior and state of an application at various points in execution.”

I’m not embarrased to admit that such level of control over the binary gives me a NERDGASM.

But getting to the point, I wanted to give this a first try on a real binary in order to test what it was capable of.

Coming from the guys behind the design of our processors I wasn’t expecting anything less than epic shit and INTEL didn’t disappoint :)

The test binary I chose was a Reverse Engineering challenge from the Nuit Du Hack 2011 CTF. It’s a small window application consisting  of a solely executable. Once run this displays what appears to be a text box and two buttons as well ass the LOIC text and the Anonymous logo (LOL).

So, it’s not that complex, isn’t it? Find the correct password and you are in.

Now, how would you approach this problem? If you would happen to know the Win32 API very well you could try to find the handle to the class related to the button and set a (conditional) breakpoint at the DispatchMessage() function. Once you find the callback that processes the event “Button Login Pressed” you could follow the exectution path and eventually locate the code that does something with your input.

THAT WAS A MOUTHFUL! Don’t worry about it, it’s not what we are going to do. We wouldn’t need DBI for that :)

“Why not? It looks like a good idea” you say.

Well, basically for three reasons:

  • great pain in the ass
  • as I mentioned before, those appear to be buttons, etc. but Immunity Debugger has another opinion on that matter.
  • the binary is packed and full of junk instructions, forget about a possible static analysis with IDA Pro.
IDA FAIL

What was I going to do? Desist? Not an option!  (say it loud with “Game of Thrones” voice)

Another approach was necessary. So I slightly modified one of the examples distributed with Pin and came with the following code, which analyzes every instruction at runtime and if it results to be a type of CALL, logs to a file the corresponding destination.

/* This program will log every function ever hit. 			*/
/* Takes advantage of INTEL Pin's intelligence.   			*/
/* It doesn't need to know in advance the functions but		*/
/* it detects when a CALL is executed instead :)			*/
/* TODO: I'm not interested in calls to system DLLs... 		*/

#include <stdio.h>
#include <sstream>
#include "pin.H"

FILE* LogFile;

/* Somehow arbitrary, improve! */
UINT32 MAX_USER_MEM = 0x70000000;

void Fini(INT32 code, void *v)
{
	fprintf(LogFile, "# EOF\n");
	fclose(LogFile);
}

/* Callbacks implementing the actual logging */
void LogCall(ADDRINT ip)
{
	/* This can be extended to fancier logging capabilities */
	UINT32 *CallArg = (UINT32 *)ip;
	fprintf(LogFile, "$ %p\n", CallArg);  // $ has no meaning, just a random token
}

void LogIndirectCall(ADDRINT target, BOOL taken)
{
	if(!taken) return;
	LogCall(target);
}

/* This is called every time a new instruction is encountered */
void Trace(TRACE trace, void *v)
{
	/* Iterate through basic blocks */
	for(BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
	{
		/* Since a BB is single entry, single exit a possible call can only be at the end */
		INS tail = BBL_InsTail(bbl);

		if(INS_IsCall(tail))
		{
			if(INS_IsDirectBranchOrCall(tail))
			{
				/* For direct branches or calls, returns the target address */
				const ADDRINT target = INS_DirectBranchOrCallTargetAddress(tail);

				if(target >= MAX_USER_MEM) continue;

				INS_InsertPredicatedCall(
							tail,
							IPOINT_BEFORE,
							AFUNPTR(LogCall),		// Fn to jmp to
							IARG_ADDRINT,			// "target"'s type
							target,				// The XXX in "CALL XXX" :)
							IARG_END			// No more args
										);

			}
			else
			{
				/* This is an indirect call (INS_IsBranchOrCall == True) */
				INS_InsertCall(
						tail,
						IPOINT_BEFORE,
						AFUNPTR(LogIndirectCall),	// Fn to jmp to
						IARG_BRANCH_TARGET_ADDR,	// Well... target address? :)
						IARG_BRANCH_TAKEN,		// Non zero if branch is taken
						IARG_END			// No more args
						);
			}
		} // end "if INS_IsCall..."
		else
		{
				/* For the case code is not in an image but in a DLL or alike */
				RTN rtn = TRACE_Rtn(trace);

				// Trace jmp into DLLs (.idata section that is, imports)
				if(RTN_Valid(rtn) && !INS_IsDirectBranchOrCall(tail) && SEC_Name(RTN_Sec(rtn)) == ".idata")
				{
					INS_InsertCall(
							tail,
							IPOINT_BEFORE,
							AFUNPTR(LogIndirectCall),
							IARG_BRANCH_TARGET_ADDR,
							IARG_BRANCH_TAKEN,
							IARG_END
							);
				}

		}
	} // end "for bbl..."
} // end "void Trace..."

/* Help message */
INT32 Usage()
{
	PIN_ERROR("Log addresses of every call ever made. Used in differential debugging.\n"
			+ KNOB_BASE::StringKnobSummary() + "\n");

	return -1;
}

/* Main function - initialize and set instrumentation callbacks */
int main(int argc, char *argv[])
{
	/* Initialize Pin with symbol capabilities */
	PIN_InitSymbols();
	if(PIN_Init(argc, argv)) return Usage();

	LogFile = fopen("functions_log.txt", "w");

	/* Set callbacks */
	TRACE_AddInstrumentFunction(Trace, 0);
	PIN_AddFiniFunction(Fini, 0);

	/* It never returns, sad :) */
	PIN_StartProgram();

	return 0;
}

Then I ran this twice, the first time I clicked on the window, moved it, wrote some random password, etc. but I DIDN’T click the Login button. The idea is to exercise as much code as possible but NOT the one that is of interest to me, that is, the password check functionality. After this I had a huge log file containing a lot of “noise” functions.

Note that this file was so large because I didn’t track the functions that were already visited and therefore I logged a lot of them thousands of times :)  It’s in my TODO list…

You can imagine what comes next. The second time I ran this I wrote a password and clicked the login “button”. Surprise, it was wrong.

At this point I had a log file containing the interesting functions and a lot of “noise” (GUI code, unpacking code, uninteresting shit…)

You can use the first file in order to filter out all this noise and get only the addresses of the interesting functions. This can be done with 10 lines of Python but in my case I used a functionality of a small tool I wrote. Check this out…

Filtered function addresses. Just three!

That whole mess reduced to three functions! Great, isn’t it? 3… 2… 1… NERDGASM!

A quick inspection of those functions inside a debugger was enough to find the code processing our password string.

RCE100 Encoder

which roughly translates to this code:

#include <stdio.h> 
#include <string.h>

int
main(int argc, char *argv[])
{
	/* RCE100 encoding loop implementation */

	// substr initialized to the whole string
	char *substr = argv[1];
	char *c = substr;
	int ext_char = 0;
	int funny = 0xDEADBEEF;
	int c1 = 0x38271606;
	int c2 = 0x5B86AFFE;
	unsigned int idx = 0;
	unsigned int len_substr = strlen(substr);

	while(idx < len_substr) 	
        { 		
          ext_char = *c;                // MOVSX 		
          funny = funny * c1;	        // IMUL ESI, ...
          ext_char = ext_char * c2;     // IMUL EAX, ...
          ext_char = ext_char - funny;	// SUB EAX, ESI 
          funny = ext_char;	        // MOV ESI, EAX 	
          *c++;			        // MOV ECX, EDI & ADD ECX, 1
          idx++;	                // ADD EDX, 1 	
        } 	 	

        printf("[x] encoded stuff: %s -> 0x%08x\n", substr, funny);

	return 0;
}

After this encoding function there is a comparison:

CMP EAX, C4B1801C

If this is true, we will be rewarded with the wanted “Login granted” banner. So the only question remaining is: “what string, after the transformation results in that number?”.

I had to reverse the algorithm.

After doing some basic math and realising this is going to be a bitch (it’s a polynomial equation of multiple variables, I have no idea which order and possibly the coefficients overflowed in the course of the calculation) I decided to do something less smart…

…bruteforce it :)

To do this I coupled the source of the string generator crunch.c (in BackTrack) with my encoding function and let it run…

I took less than 10 minutes to get the right password:

root@yomama[/pentest/passwords/crunch]
[23:43]:./crunch_the_pass 1 10 -f charset.lst lalpha-numeric-symbol14
Crunch will now generate 1094107923781757568 bytes of data
Crunch will now generate 1043422626287 MB of data
Crunch will now generate 1018967408 GB of data
a
aa
aaa
aaaa
aaaaa
aaaaaa
[x] encoded stuff: gp_gdv -> 0xc4b1801c
^C
RCE100 says: “ACCESS GRANTED” :)
Advertisements

7 thoughts on “A story of binaries and snipers

  1. Reading about PIN somewhere else, I found this post. I went with brute force too by patching the binary in memory, because I didn’t see a length constraint, the value of the next char is linked to the value of the previous and only know the end value (C4B1801C), which is calculated indeed. Tricky :)

    0041ECB0 81EC 08020000 SUB ESP,208
    0041ECB6 A1 B4284200 MOV EAX,DWORD PTR DS:[4228B4]
    0041ECBB 33C4 XOR EAX,ESP
    0041ECBD 898424 04020000 MOV DWORD PTR SS:[ESP+204],EAX
    0041ECC4 57 PUSH EDI
    0041ECC5 68 01020000 PUSH 201
    0041ECCA 8D4424 08 LEA EAX,[ESP+8]
    0041ECCE B8 04000000 MOV EAX,4
    0041ECD3 FF04E4 INC DWORD PTR SS:[ESP]
    0041ECD6 90 NOP
    0041ECD7 90 NOP
    0041ECD8 90 NOP
    0041ECD9 8D3CE4 LEA EDI,[ESP]
    0041ECDC 90 NOP
    0041ECDD E8 6EFFFFFF CALL 0041EC50
    0041ECE2 3D 1C80B1C4 CMP EAX,C4B1801C
    0041ECE7 90 NOP
    0041ECE8 ^ 75 E4 JNE SHORT 0041ECCE
    0041ECEA 6A 03 PUSH 3

    Got a hit too in less than 10 seconds (starting from 00 00 00 01):

    CPU Dump
    Address Hex dump ASCII
    0012FA04 F0 D4 6C 22| ðÔl”

    Alt+0240 = ð
    Alt+0212 = Ô
    Alt+0108 = l
    Alt+0034 = “

  2. Hi Carlos..
    cool article. I also have to start learning DBI using Pintools..
    btw..just had a question..Can you tell me how did you decompile that assembly code as it seems cryptic to me and doesn’t look like a function code that a compiler will generate..
    It’s really impressive to see how you gave a decompilation of that assembly code.
    So do you have any references I can use to make my self strong in the field of decompiling assembly code manually.
    Thanks

    1. Hi Vivek!
      Thanks, but it’s actually not that difficult. OK, the first time you see whatever assembly snippet is overwhelming but after some time it starts to make sense.
      It’s just a matter of spending time and having patience, nothing else :)
      If you want to get your feet wet I would recommend “The binary Auditor”. It has a downloadable package with TONS of exercises showing the mapping from C/C++ code to x86 ASM. They are always short and focus on one topic, for example “this is how a do-while loop looks in assembly”, etc.

      I think we’ll meet at BruCON this year. Will you stay after your course is finished?

      1. Hi Carlos,
        Thanks a lot..I just started going thru that package.
        Well, I don’t think I’ll be at BruCON..
        I’ll be starting to practice now..
        Cheers!

  3. Hi Carlos
    I just did a go through the binary auditor and found that there are exercises, but not hints or solutions to tell me if I did everything right. I mean I only know c and not C++ (no oops). How should I go around this.
    And one more thing..
    Have you ever tried to analyze binaries for softwares like radmin..
    I was just trying to analyze them for getting a hands on experience. I found out that it is somehow packed or encrypted. I mean a very few imports with getprocaddress, loadlibrary, virtualalloc being some calls among them..IDA does not recognize much..processexplorer shows the process in violet..but all the sections are there..Can you suggest how I should go about it.
    Thanks

    1. Hi Vivek,
      you’re right there are neither hints nor solutions… What I usually do is to write a short snippet in C/C++, compile it and see if it matches the original ASM. It would never be exactly the same due to compile options and/or different compiler but anyway the essential part would be quite close.

      Radmin is probably not the best thing to start since, as you mentioned, it’s packed. You could try to dump it from memory and fix the IAT, etc. but this is usually a pain in the ass. Otherwise you will have to analyse it dynamically inside a debugger. In any case, there is not a single, well known procedure here.
      If you want to get hands on with IDA Pro, etc. I would recommend to use another (non-packed) binary.

      Hope this helps!

      1. Thanks Carlos :) you have been extremely helpful..
        Btw there were also some other things I saw during debugging the radmin process like:-
        crashing olly, generating exceptions and getting terminated, getting terminated in the debugger and when I close the attached debugger, the process remains alive and works as it should..totally weird..
        I guess you are right its a pain in the ass ;) I am better off starting with something else.
        Anyway, thanks !! hope to see more articles from you on pintools real soon !!
        Cheers!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s