Reverse engineering has always been an obsession of mine. As a child, I used to go around garage sales, looking for old electronics for the sole purpose of opening them up to mess around with the insides. There is just something gratifying about opening up a closed system to see how it works, where it cuts corners, how it could be modified to work better. Software is no different. This the main reason I love open source: the code is available and for someone like me, who has an affinity for finding (read causing) bugs, available source makes it easy to find areas which need fixing.
But nothing easy is ever fun.
I have developed a strong interest in reverse engineering projects such as: wine, OpenMW, Freeablo, OpenTTD, OpenRCT2, Mono, Monogame, limetext and many others. It’s the sort of thrill that says: Anything you can do I can do better and make it free.
Personally, I’ve been interested in reverse engineering 3DO’s 1999 classic Heroes of Might and Magic 3, even though it is set for re-release by Ubisoft in 2015. The plan is to use LÖVE and take advantage of its high portability and simple 2D game API.
The first step I took is reversing the map file format. This is a tedious process as I could not find any documentation about it, unlike the abundance of documentation you will find for the Elder’s Scrolls series. Luckily, the game comes with a map editor and with the help of a hex editor and a tool I wrote to document the data structures, I was able to identify many parts of the data structures used. That, however, is the subject of another post.
The subject of this post is another strategy I am working on: reverse engineering the map editor’s code in order to extract the map loading logic. This means diving into the compiled code and changing the Portable Executable (PE) structure and inserting our code before the call to the main function.
The software I used in this guide are Ollydbg, a hex editor, mingw x86 compiler and CFF Explorer, the only non-open source software.
Adding a dll to the import table
The strategy I am using is the exact same that is used in OpenRCT2. This involves adding a custom dll for which we have full control of the functions and source code to the compiled executable’s dll import table.
First, we create a dll. Let’s call it divertedmain.dll:
__declspec(dllexport) int __cdecl DivertedMain() { return 42; }
We only want something dead simple. A main function which returns 42 won’t add more than it needs to and will return a 42 return code, letting us know that it has indeed worked.
To compile and link the dll:
i686-w64-mingw32-gcc -c -o divertedmain.o divertedmain.c i686-w64-mingw32-gcc -o divertedmain.dll -s -shared divertedmain.o -Wl,--subsystem,windows
Now let’s fire up CFF Explorer and add our new dll with the import adder, making sure it’s in the same directory as the executable and importing by name the DivertedMain
function.
Rebuild the import table and save the executable. To make sure it’s importing the dll’s exported function you can check the Import Directory in CFF Explorer. Another way would be to simply remove the divertedmain.dll file and there should be an error message when trying to load the executable.
$ wine executable-with-imported-dll.exe err:module:import_dll Library divertedmain.dll (which is needed by L"executable-with-imported-dll.exe") not found err:module:LdrInitializeThunk Main exe initialization for L"executable-with-imported-dll.exe" failed, status c0000135
Overriding main()
function
This part is a little more tricky and requires some trial and error.
We return 42 in the DivertedMain
function. Once compiled, this number can be easily found. To see what the function will look like, we can compile the dll source to assembly. Note that gcc returns AT&T style assembly by default and Ollydbg uses Intel style assembly, the -masm=intel
flag fixes this:
// i686-w64-mingw32-gcc -S -masm=intel -c -o divertedmain.s divertedmain.c .file "divertedmain.c" .intel_syntax noprefix .text .globl _DivertedMain .def _DivertedMain; .scl 2; .type 32; .endef _DivertedMain: push ebp mov ebp, esp mov eax, 42 pop ebp ret .ident "GCC: (GNU) 4.9.2" .section .drectve .ascii " -export:\"DivertedMain\""
This snippet will be important to spot while decompiling.
Finding exported function
Loading the modified executable in Ollydbg, we will go to the divertedmain
section in the Executable modules section (Alt+E).
This section tells us that the divertedmain
code is located in the 0x66B00000 - 0x66B0B000
range. Also note that the executable address space starts at 0x00400000
and that all addresses will be offset by that much. In the divertedmain
range, we should find the assembly representation of our DivertedMain()
function.
Indeed, in this case it is at the address 0x66B014B0
:
It is easily spotted thanks to the assembly compilation from earlier and the flag 2A
which represents 42. Again, this requires the need to convert AT&T style asm and intel style asm.
Finding the main()
function
We’re looking for the main function, which usually would take the command line parameters as arguments; it is also the function which does most of the logic, so a call which starts the whole program execution is likely to be the main function.
Using Ollydbg, we will step through the functions one instruction at a time. Setting arguments to the call can make it easy to find the function which uses them. In File|Set new arguments…, set “Look for me” as the new arguments.
Using the Step over button (F8), we will go through each call, looking at the registers for clues.
At about 0x004E7F6A
we find a call to KERNEL32.GetCommandLineA
which is important to note, since this is where we will get the argument list e.g.argv
and indeed, at the next instruction, Ollydbg shows us the executable and the arguments “Look for me” in EAX
. This is very important since it means one of the next instructions will call main()
. Some of the following calls strip out the executable name from the arguments which is actually the default behaviour for a call to WinMain()
.
Around 0x004E7FAF
, we start seeing a lot of PUSH instructions, a call to KERNEL32.GetModuleHandleA
leading up to a call to executable-with-imported-dll.004FBD57
. This is a clue that there is a large number of parameters being passed to a function. Since our DivertedMain()
function doesn’t take any parameters yet, these will be lost. This, however, should not be a problem for us, yet since the parameters can be added later.
The call to executable-with-imported-dll.004FBD57
starts the application, so we know that this is the call which needs to be edited. For me, it was at address 0x004E7FBF
.
Replacing the address of the call to main()
At address 0x004E7FBF
, note the current binary form for the call. This will be important to call the original function later.
At that line in Ollydbg, assemble some new code by right clicking on the line and choosing “Assemble…”, or by pressing space. Replace the call address with the address of my DivertedMain()
function which was found earlier to be 0x66B014B0
. The disassembled view substitutes the address with CALL DivertedMain
which is a good sign.
Note the binary form: E8 EC946166
When Stepping into (F7) it, the execution starts the DivertedMain()
assembly and returns 42.
Saving the changes to the call to DivertedMain()
We now have the address of the call to main()
and the binary form of the call to DivertedMain()
given by Ollydbg: 0x004E7FBF
and E8 EC946166
. As we noted earlier, the addresses of the executable are offset by 0x00400000
, therefore, in a hex editor, the call is actually at the address 0x000E7FBF
.
Opening a hex editor, the byte at 0x000E7FBF
is E8
which is the first byte of our new instructions and the x86 opcode for Call Procedure. Looking at the following 4 bytes, we can confirm that it is indeed same call we had in Ollydbg, before we edited it. Using a hex editor, we replace the bytes in 0x000E7FBF - 0x000E7FC3
to be E8 EC946166
. Careful to replace and not insert.
We save it as executable-with-divertedmain.exe
and run it:
$ wine executable-with-divertedmain.exe $ echo $? 42
The program didn’t start, but instead quit with the return code 42. This is perfect and it means the main function was diverted into our dll which returns 42. There is no more need to edit the executable and we can now rely on our own code as long as DivertedMain()
is the first defined function in our dll.
To demonstrate this, we can get the DivertedMain()
function to print out “DIVERTED!” to the console.
#include "stdio.h" __declspec(dllexport) int __cdecl DivertedMain() { printf("DIVERTED!\n"); return 42; }
Recompiling the dll and running the executable should give us:
$ i686-w64-mingw32-gcc -g -c -o divertedmain.o divertedmain.c $ i686-w64-mingw32-gcc -g -o divertedmain.dll -s -shared divertedmain.o -Wl,--subsystem,windows $ wine executable-with-divertedmain.exe DIVERTED! $ echo $? 42
Calling the original main function from inside our divertedmain
Our Entry point puts itself between the call to the main function and the program execution, prematurely exiting. In order to reverse engineer, we need to place ourselves between those points without actually stopping the execution. To do that, our first step will be to call the actual main function from the diverted main.
Earlier, we spotted command line parameters being pushed in the disassembled view. These parameters are those of the WinMain
function which we replaced.
Before we can call the original main function, we need to get its parameters.
Getting the command line parameters
We simply add the parameters to the function signature and to make sure it works, print out the arguments:
#include #include "stdio.h" __declspec(dllexport) int __cdecl DivertedMain( HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) { printf("Diverted!: %s\n", lpCmdLine); return 42; }
Recompiling the dll and running the executable should give us:
$ i686-w64-mingw32-gcc -g -c -o divertedmain.o divertedmain.c $ i686-w64-mingw32-gcc -g -o divertedmain.dll -s -shared divertedmain.o -Wl,--subsystem,windows $ wine executable-with-divertedmain.exe some arguments DIVERTED: some arguments $ echo $? 42
The fact that the executable name was missing seemed odd to me, but that’s just the way WinMain
‘s lpCmdLine
parameter works.
Fetching and calling the original main Function
Earlier when we replaced the call to main, we overrode an address with the address of our diverted main. That address is the original address to WinMain
and we will use it in order to get a pointer to the function. We then call that function.
Replace 0x00000000
with the original address:
#include #include "stdio.h" // Address of original call to WinMain #define WINMAINADDR 0x00000000 __declspec(dllexport) int __cdecl DivertedMain( HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) { printf("Diverted: %s\n", lpCmdLine); void(* WinMain)(HINSTANCE, HINSTANCE, LPSTR, int) = (void*)WINMAINADDR; WinMain(hInstance, hPrevInstance, lpCmdLine, nCmdShow); return 42; }
And there we, go. The original WinMain
function is called and we can use this technique to call any other function in the original executable, provided we know the address.
Acknowledgments
I couldn’t have figured this out without the helpful tips of IntelOrca and the detailed articles of Ashkbiz Danehkar.
I would also like to thank my friend Sophy for proofreading.