Creating and Reversing a PE Parser

Blog by afx_IDE | linkedin.com/in/oliver-ide | twitter.com/afx_IDE

Introduction

The first part of this two-part write-up will explain the Portable Executable (PE) file format, its data structures, and how to programmatically retrieve them using C. In the second part, we will employ the use of the NSA’s software reverse engineering tool “Ghidra” to decompile and analyse the “PEparser” tool which can be found here on GitHub. All code snippets in this article are based on this tool, which is based on the example code provided by Maldevacademy. Having an understanding of the PE format will enable analysts to dissect malware contained within PE files. Before reading you will ideally have an understanding of data structures in C.

The Portable Executable file format

From a high level, the Portable Executable format is a collection of data structures that provides information about an executable. Microsoft defines the PE format as:

  • “...describes the structure of executable (image) files and object files under the Windows family of operating systems…”

For example, the DOS header located at the start of a PE file is a 64 byte data structure:

The e_magic member of this structure is the so called “Magic number” which should always be “MZ” or 4D 5A in hexadecimal. Using hasherezade’s pe-bear tool, we can observe both the overall Portable Executable structure (outlined in red) and the “Magic number” (outlined in green):

You can think of this format as a filing cabinet that is used to store and categorize documents:

 In this analogy, if we wanted to read the “Magic number” from the DOS header:

  1. Open the DOS header drawer (IMAGE_DOS_HEADER)
  2. Find the e_magic folder and read its contents (4D 5A)

Instead of using drawers and folders – Windows reads data structures and uses Relative Virtual Addresses (RVAs) to interpret the file. Relative Virtual Addresses will be explained in more detail later in the article.

Just like how filing cabinets are used to keep documents organized – the PE file format provides a standardized structure that the operating system understands. It contains all the information that is needed for code to run successfully.

To learn more about the PE structure itself, read 0xRick’s series on the subject here.

For highly detailed graphics that dissect the PE format and its structures, check out this GitHub repo.

To retrieve the IMAGE_DOS_HEADER:

A variable is created (pDosHeader) of the type PIMAGE_DOS_HEADER that is used to point to the DOS header in memory. Because the DOS header is located at the beginning of a PE file, simply typecasting the base address (pPEbaseAddress) to PIMAGE_DOS_HEADER is sufficient.

 Before moving on to manually extracting other data structures from Portable Executables, you should first understand what Relative Addresses are and how they are calculated.

Relative Virtual Addresses

Because PE files won’t always be loaded at the same place in memory, Relative Virtual addresses are used to store offset values from the base address of the file to other locations. Using RVAs means that locations of data structures are still accurate regardless of where the base address of the PE is loaded. To calculate a Virtual Address, simply take the base address of the executable and add the Relative Virtual Address.

e_lfanew, which is the last member of the IMAGE_DOS_HEADER structure, stores an RVA to the IMAGE_NT_HEADERS struct (the NT header). This can be used to navigate from the DOS header at the start of the file, to the NT header.

Accessing IMAGE_NT_HEADERS via e_lfanew

The value of e_lfanew can be observed using pe-bear:

Accessing the IMAGE_NT_HEADERS struct requires that we take the base address of the PE we are analysing and add the value of e_lfanew:

The variable pNTheaders that is created is the result of taking the PE’s base address (pPEbaseAddress) and adding the value of the e_lfanew member of the DOS header (from the pDosHeader variable previously created).

Inside of the IMAGE_NT_HEADERS struct (either IMAGE_NT_HEADERS32 or IMAGE_HEADERS64 depending on the architecture of the file) are two members that will allow us to navigate to the File Header and the Optional Header.

IMAGE_FILE_HEADER

With access to the IMAGE_NT_HEADERS structure, we can use its second member FileHeader to retrieve the IMAGE_FILE_HEADER struct:

This data structure contains many important members:


IMAGE_OPTIONAL_HEADER

The Optional Header is accessed via IMAGE_NT_HEADERS with the OptionalHeader member

It contains a lot of members that store important information. The most important of which are listed below:

The DataDirectory member is a pointer to the first IMAGE_DATA_DIRECTORY structure which contains the directories within the PE file.

IMAGE_DATA_DIRECTORY

This data structure is an array. Every element in the array corresponds to a specific data directory. Important data directories include:

  • Export Directory (IMAGE_DIRECTORY_ENTRY_EXPORT) – Contains information about exported functions and data.
  • Import Directory (IMAGE_DIRECTORY_ENTRY_IMPORT) – Contains information about imported functions and data.
  • Resource Directory (IMAGE_DIRECTORY_ENTRY_RESOURCE) – Holds information about resources like icons and bitmaps
  • Exception Directory (IMAGE_DIRECTORY_ENTRY_EXCEPTION) – Information about exception handling.

Data directories are accessed by retrieving the DataDirectory member from the Optional Header and using indexes within the array of IMAGE_DATA_DIRECTORY:

i.e. to access the export directory:

IMAGE_EXPORT_DIRECTORY

The export table (IMAGE_EXPORT_DIRECTORY) is not documented by Microsoft so third party documentation has to be used.

To retrieve the export table:

Members of the IMAGE_EXPORT_DIRECTORY struct that are of interest to us include:

  • NumberOfFunctions – Stores the amount of functions that are exported from the PE file
  • NumberOfNames – Stores amount of function names exported from PE file
  • AddressOfFunctions – Stores the address of an array of addresses of the exported functions
  • AddressOfNames – Similarly to AddressOfFunctions, stores an address to an array of addresses that contain the exported function names
  • AddressOfNameOrdinals – Stores the address of an array of addresses of ordinal numbers for exported functions (ordinal numbers are used to efficiently reference exported functions)

IMAGE_IMPORT_DESCRIPTOR

For each DLL that is used to import functions, a IMAGE_IMPORT_DESCRIPTOR structure is created. All of these structures make up the import address table.

 To retrieve the import address table from IMAGE_OPTIONAL_HEADER:

Here we are calculating the RVA address (highlighted in red). The calculated RVA is casted to the PIMAGE_IMPORT_DESCRIPTOR type (highlighted in green) – this ensures memory at that location is interpreted as an array of IMAGE_IMPORT_DESCRIPTOR structs. The result is stored in pImportDescriptor.

IMAGE_SECTION_HEADER

Portable Executables have different sections, each are responsible for storing different types of data. For example:

  • .text (code section) – Holds the entry point of the program and the executable code. This is usually the first section
  • .data – Stores initialized data (i.e. strings, global and static variables)
  • .rsrc (resource section) – Stores resources used for UI such as icons and images
  • .rdata (read-only data section) – Contains read-only data such as constant strings

These sections can be analysed using pe-bear:

Each of the sections has an IMAGE_SECTION_HEADER structure that contains information about it:

Important members of the IMAGE_SECTION_HEADER are explained below:

  • Name – the name of the section (i.e. text, data etc.)
  • VirtualAddress – RVA to the section in memory
  • SizeOfRawData – Size of the data within the section in bytes
  • PointerToRelocations – File offset to the relocation entries for the section
  • NumberOfRelocations – Specifies the number of relocations for the section
  • Characteristics – Stores flags regarding the characteristics of the section. For example “IMAGE_SCN_MEM_EXECUTE” means the section can be executed as code.

The section table can be described as a collection of these IMAGE_SECTION_HEADER data structures. It is an array where each element is a IMAGE_SECTION_HEADER structure – one for each of the sections. Because the section headers start directly after the NT headers, simply skipping over the NT headers by using sizeof provides a convenient way to navigate to the section table:


Doing this will land us at the start of the collection of section header structures. This is stored in pSectionHeader.

Not every PE file will have the same amount of sections. Compiler settings, linker settings, and the type of data stored will all have an affect on the number of sections present. For this reason, it is necessary that we setup a loop to iterate through each section.


The number of sections is pulled from the FileHeader via the NumberOfSections member (highlighted in green). Using this information we can set up a for loop that will cease execution once all of the sections have been extracted. Then, on the second line the value of pSectionHeader is updated. The size of one section is added to pSectionHeader such that the next iteration of the loop is pointing towards the next IMAGE_SECTION_HEADER struct (highlighted in yellow).

Undocumented Structures

Similarly to IMAGE_EXPORT_DIRECTORY and IMAGE_IMPORT_DESCRIPTOR mentioned earlier, there are various other undocumented structures that can be used to further analyse the contents of a portable executable. Although not documented on the MSDN, these structures can be found in the Winnt.h header file. Such structures include but are not limited to.

  • IMAGE_TLS_DIRECTORY – Contains information regarding thread local storage
  • IMAGE_BASE_RELOCATION – Holds details about relocated functions and variable that have been imported
  • IMAGE_RUNTIME_FUNCTION_ENTRY – A structure used for exception handling

Conclusion

Combining all of the above techniques, we can create a tool capable of extracting data structures from a Portable Executable. On my GitHub, you can find the full code for the “PEparser” tool. In this post, you have learned about the Portable Executable file format is, its data structures, and how to extract them using C. By continuing to learn more about this subject, you will develop skills that will aid you in development and analysis of executable files in Windows. The first part of this series is geared more towards development than analysis – but in the next instalment, you will be introduced to Ghidra. Using Ghidra, we will look at a decompilation of the “PEparser” tool and learn how to interpret and annotate it.

Thanks for reading the first part of this series!