Introduction
The first part of this two-part write-up will explain the Portable Executable (PE) file format, its data structures, and how to programmatically retrieve them using C. In the second part, we will employ the use of the NSA’s software reverse engineering tool “Ghidra” to decompile and analyse the “PEparser” tool which can be found here on GitHub. All code snippets in this article are based on this tool, which is based on the example code provided by Maldevacademy. Having an understanding of the PE format will enable analysts to dissect malware contained within PE files. Before reading you will ideally have an understanding of data structures in C.
The Portable Executable file format
From a high level, the Portable Executable format is a collection of data structures that provides information about an executable. Microsoft defines the PE format as:
- “...describes the structure of executable (image) files and object files under the Windows family of operating systems…”
For example, the DOS header located at the start of a PE
file is a 64 byte data structure:
The e_magic member of this structure is the so
called “Magic number” which should always be “MZ” or 4D 5A in
hexadecimal. Using hasherezade’s pe-bear tool, we can observe
both the overall Portable Executable structure (outlined in red) and the “Magic
number” (outlined in green):
You can think of this format as a filing cabinet that is
used to store and categorize documents:
- Open the DOS header drawer (IMAGE_DOS_HEADER)
- Find the e_magic folder and read its contents (4D 5A)
Instead of using drawers and folders – Windows reads data structures and uses Relative Virtual Addresses (RVAs) to interpret the file. Relative Virtual Addresses will be explained in more detail later in the article.
Just like how filing cabinets are used to keep documents
organized – the PE file format provides a standardized structure that the
operating system understands. It contains all the information that is needed
for code to run successfully.
To learn more about the PE structure itself, read 0xRick’s
series on the subject here.
For highly detailed graphics that dissect the PE format and its structures, check out this GitHub repo.
To retrieve the IMAGE_DOS_HEADER:
A variable is created (pDosHeader) of the type
PIMAGE_DOS_HEADER that is used to point to the DOS header in memory. Because
the DOS header is located at the beginning of a PE file, simply typecasting the
base address (pPEbaseAddress) to PIMAGE_DOS_HEADER is sufficient.
Relative Virtual Addresses
Because PE files won’t always be loaded at the same place in memory, Relative Virtual addresses are used to store offset values from the base address of the file to other locations. Using RVAs means that locations of data structures are still accurate regardless of where the base address of the PE is loaded. To calculate a Virtual Address, simply take the base address of the executable and add the Relative Virtual Address.
e_lfanew, which is the last member of the IMAGE_DOS_HEADER structure, stores an RVA to the IMAGE_NT_HEADERS struct (the NT header). This can be used to navigate from the DOS header at the start of the file, to the NT header.Accessing IMAGE_NT_HEADERS via
e_lfanew
The value of e_lfanew can be observed using
pe-bear:
Accessing the IMAGE_NT_HEADERS struct requires that we
take the base address of the PE we are analysing and add the value of e_lfanew:
The variable pNTheaders that is created is the result of taking the PE’s base address (pPEbaseAddress) and adding the value of the e_lfanew member of the DOS header (from the pDosHeader variable previously created).
Inside of the IMAGE_NT_HEADERS
struct (either IMAGE_NT_HEADERS32 or IMAGE_HEADERS64 depending on the
architecture of the file) are two members that will allow us to navigate to the
File Header and the Optional Header.
IMAGE_FILE_HEADER
With access to the IMAGE_NT_HEADERS structure, we can use
its second member FileHeader to retrieve the IMAGE_FILE_HEADER struct:
This data structure contains many important members:
IMAGE_OPTIONAL_HEADER
The Optional Header is accessed via IMAGE_NT_HEADERS with the OptionalHeader member
It contains a lot of members that store important information.
The most important of which are listed below:
The DataDirectory member is a pointer to the first
IMAGE_DATA_DIRECTORY structure which contains the directories within the PE
file.
IMAGE_DATA_DIRECTORY
This data structure is an array. Every element in the array corresponds to a specific data directory. Important data directories include:
- Export Directory (IMAGE_DIRECTORY_ENTRY_EXPORT) – Contains information about exported functions and data.
- Import Directory (IMAGE_DIRECTORY_ENTRY_IMPORT) – Contains information about imported functions and data.
- Resource Directory (IMAGE_DIRECTORY_ENTRY_RESOURCE) – Holds information about resources like icons and bitmaps
- Exception Directory (IMAGE_DIRECTORY_ENTRY_EXCEPTION) – Information about exception handling.
Data directories are accessed by retrieving the DataDirectory member from the Optional Header and using indexes within the array of IMAGE_DATA_DIRECTORY:
i.e. to access the export directory:
IMAGE_EXPORT_DIRECTORY
The export table
(IMAGE_EXPORT_DIRECTORY) is not documented by Microsoft so third
party documentation has to be used.
To retrieve the export table:
Members of the IMAGE_EXPORT_DIRECTORY struct that are of interest to us include:
- NumberOfFunctions – Stores the amount of functions that are exported from the PE file
- NumberOfNames – Stores amount of function names exported from PE file
- AddressOfFunctions – Stores the address of an array of addresses of the exported functions
- AddressOfNames – Similarly to AddressOfFunctions, stores an address to an array of addresses that contain the exported function names
- AddressOfNameOrdinals – Stores the address of an array of addresses of ordinal numbers for exported functions (ordinal numbers are used to efficiently reference exported functions)
IMAGE_IMPORT_DESCRIPTOR
For each DLL that is used to import functions, a IMAGE_IMPORT_DESCRIPTOR structure is created. All of these structures make up the import address table.
To retrieve the import address table from IMAGE_OPTIONAL_HEADER:
Here we are calculating the RVA
address (highlighted in red). The calculated RVA is casted to the
PIMAGE_IMPORT_DESCRIPTOR type (highlighted in green) – this ensures memory at
that location is interpreted as an array of IMAGE_IMPORT_DESCRIPTOR structs. The
result is stored in pImportDescriptor.
IMAGE_SECTION_HEADER
Portable Executables have different sections, each
are responsible for storing different types of data. For example:
- .text (code section) – Holds the entry point of the program and the executable code. This is usually the first section
- .data – Stores initialized data (i.e. strings, global and static variables)
- .rsrc (resource section) – Stores resources used for UI such as icons and images
- .rdata (read-only data section) – Contains read-only data such as constant strings
These sections can be analysed using pe-bear:
Each of the sections has an IMAGE_SECTION_HEADER structure that contains information about it:
Important members of the IMAGE_SECTION_HEADER are explained below:
- Name – the name of the section (i.e. text, data etc.)
- VirtualAddress – RVA to the section in memory
- SizeOfRawData – Size of the data within the section in bytes
- PointerToRelocations – File offset to the relocation entries for the section
- NumberOfRelocations – Specifies the number of relocations for the section
- Characteristics – Stores flags regarding the characteristics of the section. For example “IMAGE_SCN_MEM_EXECUTE” means the section can be executed as code.
The section table can be described as a collection of these IMAGE_SECTION_HEADER data structures. It is an array where each element is a IMAGE_SECTION_HEADER structure – one for each of the sections. Because the section headers start directly after the NT headers, simply skipping over the NT headers by using sizeof provides a convenient way to navigate to the section table:
Undocumented Structures
Similarly to IMAGE_EXPORT_DIRECTORY and IMAGE_IMPORT_DESCRIPTOR mentioned earlier, there are various other undocumented structures that can be used to further analyse the contents of a portable executable. Although not documented on the MSDN, these structures can be found in the Winnt.h header file. Such structures include but are not limited to.
- IMAGE_TLS_DIRECTORY – Contains information regarding thread local storage
- IMAGE_BASE_RELOCATION – Holds details about relocated functions and variable that have been imported
- IMAGE_RUNTIME_FUNCTION_ENTRY – A structure used for exception handling
Conclusion
Combining all of the above techniques, we can create a tool capable of extracting data structures from a Portable Executable. On my GitHub, you can find the full code for the “PEparser” tool. In this post, you have learned about the Portable Executable file format is, its data structures, and how to extract them using C. By continuing to learn more about this subject, you will develop skills that will aid you in development and analysis of executable files in Windows. The first part of this series is geared more towards development than analysis – but in the next instalment, you will be introduced to Ghidra. Using Ghidra, we will look at a decompilation of the “PEparser” tool and learn how to interpret and annotate it.
Thanks for reading the first part of this series!