Windows Portable Executable (PE) Files Structure

Written by Filovirid on Oct 05, 2021

You can always read the latest version of the post at

The structure (format) of the Windows Portable Executable (PE) files is not that difficult, but the problem is that I couldn't find a nice, complete tutorial about it. So I decided to write my own version explaining everything completely from scratch. So this will be a very long, long post!

If you think something is wrong in this post, or if you have any suggestion, please feel free to write your comment. Here is the table of content:


You don't need to know a lot about windows internals or different tools to understand this tutorial, but I think it would be helpful if you know a little bit about processes and memory management in operating systems.

Before starting the tutorial, you need to download and install Explorer suite, which is a PE file editor/viewer and it's free. You also need to download x64dbg. There is no need to install it just extract it somewhere so that you can use it (we are going to use x32dbg which is the 32-bit version of x64dbg but they are both in the same package).

I already compiled a sample executable file which does nothing but showing a message and exit. You can download it from here. The zip file is password-protected and the password is: password

Some important definitions:

We are going to use some specific terms which we need to know their definition before exploring PE files. So here is the definition of some of these keywords.

Process and virtual memory management

(if you already know a little bit about process and memory management, then you can skip this section)

The definition of a process is different from an executable file. An executable file is the output of the compiler (and linker). An executable file contains different sections and headers (parts which give information about sections and the structure of the .exe file). When you try to execute the file, the operating system tries to open and read some information from the file and load it into the memory.  A process is an active execution of the executable file in memory and CPU. A process can have several states (active, suspended, dead).

You never work directly with the physical memory, but instead, with something called virtual memory. As the name suggests, it's just a virtual thing and doesn't exist. So here is how it works: Physical memory is limited (4GB, 16 GB,...) but the OS is able to execute lots of executable files at the same time. This can be done with the help of virtual memory. When you execute a binary file, the OS will create an imaginary space for your binary (for 32-bit OS, this space is 4GB which might be bigger than the actual amount of RAM you have!). This space is exclusively for one process. Every process has its own 4GB virtual space. Each 4GB space is divided into some smaller parts called page (which is usually 4KB). In reality, one process, at the time of execution, may only need two pages (8KB) not the whole 4GB. Therefore, the OS only needs to keep track of those useful pages and keep them in the memory. There is something called page table which maps these virtual addresses to physical addresses. When a page is not used by the process, the OS can save it on the disk in order to save some physical memory. This procedure is called swapping. 

The 4GB memory (in 32-bit machines) dedicated to each process is called Virtual Address (VA) Space. When you use linkers (let's say Microsoft linker), the linker specifies the start address of the Virtual Address space that should be allocated by the OS. This default address can be found in the header of the .exe file (we will see later) and is called 'ImageBase'. There is no guarantee that the loader allocates the exact same address and because of that, all the addresses in the binary file is relative to the ImageBase and not absolute based on the ImageBase. These relative addresses are called Relative Virtual Address (RVA). Most of the addresses you find in binary files are relative. Therefore, it's important to know how to convert RVA to VA and vice versa:

VA = ImageBase + RVA

RVA = VA - ImageBase

That's it!

Another important thing is how to convert VA and RVA to file offset and vice versa. This one, we explore when we know more about PE sections.

Sample file to download

(If you are fine with downloading the sample file, you don't need to go through the compilation and linking stuff and you can skip this step)

When you donwload the sample zip file and you extract it, you will get a .exe file which is exactly 3,072 bytes and here is the source code of the file (if you don't like to download a binary file, you can compile the following code)

section .data
    szcaption db "This is caption",10,0
    sztext    db "This is the main text", 10, 0
section .text
global _main
extern _ExitProcess@4
extern _MessageBoxA@16
xor eax, eax
push eax
push dword szcaption
push dword sztext
push dword 0
call _MessageBoxA@16
push dword 0
call _ExitProcess@4

The code has been written in NASM (Netwide Assembler) and you can compile it with the following command line (assuming you save the code in miniexe.nasm file):

nasm.exe -f win32 -o miniexe.o miniexe.nasm

and then using Microsoft Visual Studio 2019 x86 linker:

link.exe miniexe.o kernel32.lib user32.lib /OUT:miniexe.exe /entry:main /subsystem:windows /nodefaultlib

General structure of a PE file

Generally, the structure of a PE file is like the image below. There are lots of other details related to a PE file but for now it's enough to know that this is the overall structure.


Figure 1 - Overall structure of a PE file

DOS header

The first 64 bytes of a PE32 file is called DOS header. The structure of this header is as below

struct _IMAGE_DOS_HEADER {  
    WORD   e_magic;         // Magic number 'MZ'
    WORD   e_cblp;          // Bytes on last page of file
    WORD   e_cp;            // Pages in file
    WORD   e_crlc;          // Relocations
    WORD   e_cparhdr;       // Size of header in paragraphs
    WORD   e_minalloc;      // Minimum extra paragraphs needed
    WORD   e_maxalloc;      // Maximum extra paragraphs needed
    WORD   e_ss;            // Initial (relative) SS value
    WORD   e_sp;            // Initial SP value
    WORD   e_csum;          // Checksum
    WORD   e_ip;            // Initial IP value
    WORD   e_cs;            // Initial (relative) CS value
    WORD   e_lfarlc;        // File address of relocation table
    WORD   e_ovno;          // Overlay number
    WORD   e_res[4];        // Reserved words
    WORD   e_oemid;         // OEM identifier (for e_oeminfo)
    WORD   e_oeminfo;       // OEM information; e_oemid specific
    WORD   e_res2[10];      // Reserved words
    DWORD  e_lfanew;        // File address of new exe header

Since we are talking about Windows PE32 files, most of these fields are useless since this is the header for MS-DOS executable files. The reason that this header exists is just for backward compatibility between MS-DOS and Windows. So if you want to execute a Windows PE file in MS-DOS environment, you won't get an ugly error, but instead the MS-DOS operating system will read this header and execute the second part of the .exe file which is called DOS stub (look at the above figure) and all this 'DOS stub' part does is to print a nice message and tell the user that this program cannot be run in MS-DOS mode!

However, there are two important fields in the DOS header that we are interested in: i) e_magic and, ii) e_lfanew. 

Now let's see the DOS header of our sample miniexe.exe file. Image below is the screenshot of the CFF explorer tool as part of  explorer suit program that you installed:

Figure 2 - DOS header of miniexe.exe sample file

In the above image, you see the address of the PE header as 0xC0000000, but this is stored in the reverse order so the real value is 0x000000C0.



Now let's take a look at the DOS stub. Whatever between the end of the DOS header (after 64th bytes from the start of the PE fie) and before the PE header (0x000000C0) is called DOS header. This is actually a piece of executable code which can be executed in the MS-DOS environment and as we said it just print an error message in STDOUT (standard output) and exit. below, you can see the hex representation of the DOS stub section.


Figure 3 - Hex representation of DOS stub

If you want to see the result of executing the DOS stub part (If you don't believe me!), you need to have MS-DOS or you can install DOSBox and execute miniexe.exe file in it. The image below shows the result of executing miniexe.exe (32-bit exe file) in DOSBox!

Figure 4 - Executing 32-bit binary in MS-DOS (DOSBox)

And if you want to see the source code of the DOS stub, you can decompile it in 16-bit mode using CFF explorer (or any other disassembler you have)

Figure 5 - Disassembled code of the DOS stub

This page can help you understand the interrupts in MS-DOS. The above code just prints a message and exit. So when we are running our binary file in Windows environment, the DOS stub never get executed. In fact, we don't need DOS stub at all. It's just there for backward compatibility. Therefore, if you want to re-write some parts of your binary file, you can use the DOS stub space and some parts of the DOS header as well.


PE header

After the DOS header and DOS stub, we have PE header, which is the first important header of the file and gives us lots of information. In our sample binary, it starts from 0x000000C0. The structure of the Windows PE header (also called NT header) is as follows:

                  DWORD                             Signature;
                  IMAGE_FILE_HEADER                 FileHeader;
                  IMAGE_OPTIONAL_HEADER32           OptionalHeader;

The structure has one DWORD (4 bytes) which is called the signature of the PE header and two other structure as follows:

  WORD  Machine;
  WORD  NumberOfSections;
  DWORD TimeDateStamp;
  DWORD PointerToSymbolTable;
  DWORD NumberOfSymbols;
  WORD  SizeOfOptionalHeader;
  WORD  Characteristics;
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;
  DWORD                BaseOfCode;
  DWORD                BaseOfData;
  DWORD                ImageBase;
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  DWORD                SizeOfStackReserve;
  DWORD                SizeOfStackCommit;
  DWORD                SizeOfHeapReserve;
  DWORD                SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;
  DWORD VirtualAddress;
  DWORD Size;

So we are going to explain each field of all four structures defined above. You can also check Microsoft documentation about to better understand each field.

The first DWORD of the IMAGE_NT_HEADERS is called signature, and it always must be 'PE\0\0' (0x50, 0x45, 0x00, 0x00). If you change this signature, you can not execute the binary file. The second member of the IMAGE_NT_HEADERS is another structure called IMAGE_FILE_HEADER with the following fields:


Figure 6 - Different values of the Machine type


The second structure of the IMAGE_FILE_HEADRE is called IMAGE_OPTIONAL_HEADER, probably the most important part of the header in PE files. The structure consists of 9 WORDS + 2 BYTES + 19 DWORDS + IMAGE_DATA_DIRECTORY structure (which can have different length for different binaries, but most of the time its length is 128 BYTES). The following fields are in IMAGE_OPTIONAL_HEADER structure:

Figure 8 - SizeOfHeaders for our sample binary



Although each entry of the DataDirectory has a completely different purpose, all of the entries share a very same structure defined above as IMAGE_DATA_DIRECTORY, which is eight bytes long with two fields:

Figure 9, shows the overview of all the headers we read so far, each specified with a different color.

Figure 9 - Different headers of the sample PE file


In the next section, we will explain about section header, the last important header of a PE file and then we will get back to DataDirectory and explain a few entries.


Section Table

Another part of the PE file, as specified in Figure 1 of this tutorial is called section table. We already explained about different types of sections in a PE file like .text, .data, .rsrc sections. Section Table contains all the necessary information about each section in the PE file. The number of rows in the Section Table shows the number of sections we have in the PE file. The Section Table immediately follows the IMAGE_OPTIONAL_HEADER structure. Therefore, it's exactly after the DataDirectory array. The structure of the Section Table is as follows:

  union {
    DWORD PhysicalAddress;
    DWORD VirtualSize;
  } Misc;
  DWORD VirtualAddress;
  DWORD SizeOfRawData;
  DWORD PointerToRawData;
  DWORD PointerToRelocations;
  DWORD PointerToLinenumbers;
  WORD  NumberOfRelocations;
  WORD  NumberOfLinenumbers;
  DWORD Characteristics;

Each entry of the IMAGE_SECTION_HEADER is exactly 40 bytes. Here is the description of each field:

Now let's check some of the sections of our sample binary file to get more familiar with the concept. Figure 10, shows the structure of the first section in our sample binary file. As we mentioned, there are four sections in our sample file: i) .text section, ii) .rdata section, iii) .data section and finally, iv) .reloc section.

Section table of the sample binary file
Figure 10 - Section table of the sample binary file

Now let's check the first section of our sample file. It begins right after the IMAGE_OPTIONAL_HEADER. Each record is 40 bytes long and in Figure 10, the first record is highlighted. It begins with 0x2E (.), 0x74 (t), 0x65 (e), 0x78 (x), 0x74 (t) followed by three zeros to fill the eight bytes for the Name field. The second field is VirtualSize which is 0x00000027 (39). This means that the length of the .text section is 39 bytes! As we know, the .text section is actually the executable code of the binary file (we know it not becase the name of the section is ".text" but because of the Characteristics of the section which is marked as executable). Therefore, the executable code of our binary is only 39 bytes which is very small. This is because the binary file does nothing but print a message using MessageBoxA function and then exit. Let's check the source code in x32dbg (we already have it installed as it was part of the requirements of this tutorial).

Figure 11 - Disassembly of the .text section


The OPCODE column (hex representation) of Figure 11, shows the operational code of the ASSEMBLY column. You can count the OP-codes and see it's exactly 39 bytes begins at the virtual address (VA) of 0x401000 since the ImageBase is 0x400000 and the VirtualAddress of the .text section (this is RVA) is 0x1000 (first row of the table in Figure 10). The SizeOfRawData for the .text section is 0x200 (512) bytes, while we know that the actual size is 0x27 (39) bytes. This is due to the fact that the SizeOfRawData must be a multiple of FileAlignment (which is 512 bytes). Therefore, anything less than 512 bytes will be rounded to 512 and filled with zeros. The same is true for memory. The .text section begins at RVA 0x1000 and the length is 0x27 bytes but the next section (.rdata) begins at RVA 0x2000 and this is because of the value specified in SectionAlignment (0x1000) meaning that any new section must start at multiple of SectionAlignment.


DataDirectory Entries

Now that we know a little bit about PE files, it's time to explore DataDirectory array (remember that DataDirectory array is the last member of IMAGE_OPTIONAL_HEADER32). As we said, DataDirectory array is an array of 16 elements, all of them have the same structure defined in IMAGE_DATA_DIRECTORY (consists of 2 DWORDS, 8 bytes in total). Each member of the DataDirectory array has a specific purpose and resides in one of the PE sections (remember that in a PE file, we have only headers and sections. So whatever that is not in headers, it's in one of the sections). As shown in Table 1, a typical DataDirectory has 16 entries. Each entry (except for the reserved one) is important and serves a purpose. Here we are going to introduce some of the important ones and explain the structures.

Import Directory and Import Section

When you are writing your code (in whatever language), sometimes you need to call/use some functions from a library or .DLL file (DLL stands for Dynamic Link Library). For example, to show a message box to the user, you need to use MessageBoxA function.  It doesn't matter if you never heard of this function or never used it before. Whatever library or function you are using to show a message box is probably using this function behind the scene, since this is one of the Windows API functions. This function is part of user32.dll file in your operating system (C:\Windows\System32\user32.dll). Think of the Windows DLL files as something like .SO files in Linux. Conceptually they are the same thing but in different OSes.  So let's say you use this function, you compile and link your code, and you get the executable file working perfectly fine but how the linker knows about this external function in your code? Where it can be found at run-time? Is it always in a fixed address in the memory? The answer is NO, the address may change every time you execute your file. Therefore, there must be a way to somehow know the address of MessageBoxA without knowing the address of MessageBoxA!!
This is a collaboration between the compiler (or assembler), the linker and the Windows loader. Consider the assembly code I wrote at the beginning of the tutorial. The 7th line is extern _MessageBoxA@16 and in line 14, there is a call to this external symbol: call _MessageBoxA@16. Here is the listing output of the NASM (use -l switch to create the listing file in NASM):

     1                                  section .data
     2 00000000 546869732069732063-     	szcaption db "This is caption",10,0
     2 00000009 617074696F6E0A00   
     3 00000011 546869732069732074-     	sztext	db "This is the main text", 10, 0
     3 0000001A 6865206D61696E2074-
     3 00000023 6578740A00         
     5                                  section .text
     7                                  global _main
     9                                  extern _ExitProcess@4
    10                                  extern _MessageBoxA@16
    13                                  _main:
    14 00000000 31C0                    xor eax, eax
    15 00000002 50                      push eax
    16 00000003 68[00000000]            push dword szcaption
    17 00000008 68[11000000]            push dword sztext
    18 0000000D 6A00                    push dword 0
    19 0000000F E8(00000000)            call _MessageBoxA@16
    20 00000014 6A00                    push dword 0
    21 00000016 E8(00000000)            call _ExitProcess@4

Looking at line 19, there is a call instruction (0xE8) to _MessageBoxA@16, but this is an external symbol that the compiler (here assembler) doesn't know anything about it! So you see the address of the call is all zeros (0xE8 00000000). The same is true for line 21 and call to _ExitProcess@4. Now, this is the duty of the linker (here Microsoft linker) to correctly fix up the call instruction. Let's take a look at the output of the assembler/compiler (here we use NASM as the assembler) which is an object file to see how it works (we explained how to create the object file here). I use DUMPBIN to dump the relocation table of the object file.


Figure 12 - dumping the object file using DUMPBIN

Figure 12, shows the relocation table of the object file, created by the compiler/assembler to be used by the linker. This won't be part of the final executable file, but just there to help the linker to better understand which parts of the compiled code need fixups. The first column (Offset) is the location from the beginning of the .text section. Looking at the third row, it tells the linker that: offset 0x10 (16) of the .text section is of type REL32 and the current value is 0x00000000 and must be fixed up (REL32 is 32-bit relative address). Matt Peitrek has this nice article about linkers you may want to read it as well.

So now we know that it's the responsibility of the linker to fill missing addresses in the object file using the relocation table in the object file. However, even the linker doesn't know the address of the loaded .DLL files at run-time since in every execution of the executable file, the .DLL files may load in different addresses. To solve this problem, linkers use a very clever approach.  Here is what linkers do:

Now the question is: what is ADDR_XX? This is an address of a DWORD in the memory which contains the real address of the function (e.g., MessageBoxA) when the Windows loader loads user32.dll into the memory. Since the linker doesn't know this address at the time of linking, it's up to the loader to fill this address in the memory at the time of loading DLL files (you can check this article for more info).

Everything will be more clear when we start exploring import address table and import directory in our sample binary file.

The import section contains information about the external libraries (.DLL) files and their functions which are used in the code. The Windows loader reads the import section and loads all the necessary DLLs into the memory, then adds the correct address to those functions in a table called import table. You may sometimes see that some people use the term 'import directory' instead of import section, which is basically the same thing. The import directory is nothing more than an array of a specific structure called IMAGE_IMPORT_DESCRIPTOR with the following fields:

    OriginalFirstThunk    DWORD;
    TimeDateStamp         DWORD;
    ForwarderChain        DWORD;
    Name                  DWORD;
    FirstThunk            DWORD;

In other resources, you may see the first element as 'Characteristics' not the 'OriginalFirstThunk' but the correct and new version is this one since Microsoft changed the meaning and never updated the WINNT.h header file. Let's see the role of each member:

Each element of the import directory array is exactly 20 bytes (length of IMAGE_IMPORT_DESCRIPTOR = 20 ). Remember that the number of elements in the import directory array is the same as the number of imported DLL files. So in our sample binary, we are using two .DLL files: i) user32.dll (for MessageBoxA) and, ii) kernel32.dll (for ExitProcess). Therefore, import directory array has two effective members. However, there is no field to specify the length of the array instead, the final structure of the array is all zeros. Therefore, in our sample binary file, the image directory array has two members for two DLL files and another member filled with zeros (20 bytes of zeros) to indicate the end of the array.

Import Directory in practice

Having our sample binary file, let's say we want to check the imported DLL files and used API functions. So, the first thing we need to do is to find the address of the import section (or import directory) in our binary file. To do so, first we check the DataDirectory array (in IMAGE_OPTIONAL_HEADER) for the second entry (look at table 1 of this tutorial if you forgot) which is RVA and size of the import directory (you can see it in the image below).

Figure 13 - DataDirectory of our sample binary file

As you can see, the RVA of the import directory/section is 0x000020F0 and there is also a hint (last column of Fig. 13) which tells us that import section is part of .rdata section. However, we don't care about the names of the sections. We need to find the right section only based on address, not by the name (we already know that section's name is just a hint and anyone can change it to anything arbitrary value). Having 0x000020F0 as the relative virtual address (RVA), we check the section table (if you forgot, you can go back and read about the section table) to see this address lies in which section.

Figure 14 - Section table of the sample binary file

Checking the third column of Figure 14, we see that 0x000020F0 is greater than 0x00002000 and less than 0x00003000. Therefore, it belongs to the second row or let's say .rdata section (So the hint was correct!). Now it's time to find the location on the disk because 0x000020F0 is an RVA, meaning a memory address. So to convert an RVA to file offset, we use the following approach:

Therefore, we have: 0x000020F0 - 0x00002000 = 0x0F0 and then 0x00000600 + 0x0F0 = 0x6F0

This address (0x6F0) is the beginning of the import directory array on the disk as shown in Figure 15.

Figure 15 - Import Directory arrays

As shown in Figure 15, the import directory array has three members, two of them related to two .DLL files (user32.dll and kernel32.dll) and the last one is null to indicate the end of the array. The second entry (highlighted in yellow) is related to user32.dll and MessageBoxA function. The first four bytes of the second entry (black underline) is the OriginalFirstThunk member of the IMAGE_IMPORT_DESCRIPTOR structure. The value is 0x00002134 which is the RVA of IMAGE_THUNK_DATA. Converting this RVA to file offset, we have 0x734 with the value 0x00002158. Since the MSB is not set in 0x00002158, the first 31 bits of the DWORD is the RVA to IMAGE_IMPORT_BY_NAME structure. Converting the value to file offset, we have 0x758 which is the start of IMAGE_IMPORT_BY_NAME structure. The first two bytes (WORD) is the hint (0x7F, 0x02) and the rest is the null-terminated ASCII string corresponding to MessageBoxA\0 as shown in Figure 16.

Figure 16 - IMAGE_IMPORT_DESCRIPTOR structure

Looking again at Figure 15, the fourth DWORD of the second entry (underlined in red) is the Name field of the IMAGE_IMPORT_DESCRIPTOR which is the RVA of an ASCII string that contains the name of the DLL to load. In this example (Figure 15), the value is 0x00002166 which equals to the file offset 0x766. Figure 17 shows the highlighted value at this offset, which corresponds to the string 'USER32.dll'.

Figure 17 - Extracting DLL name from the IMAGE_IMPORT_DESCRIPTOR structure

So far, we covered one of the most important members of the DataDirectory array called, import data directory. In the next sections, we will cover other members of the DataDirectory such as export section (.edata), relocation section, resource section and TLS section.



Please feel free to comment!