You should be cautious about the code section as the base of the code section might not contain only code (imports or read only data might be present at this location).
The best way to start a disassembly is by looking at the AddressOfEntryPoint
field in the IMAGE_OPTIONAL_HEADER
which indicates the first executed byte in the PE file (except if TLS is present but that's another subject).
A very good library for browsing PE files in python is pefile.
Here's an example to get the first 10 bytes at the program entry point:
#!/usr/local/bin/python2
# -*- coding: utf8 -*-
from __future__ import print_function
import sys
import os.path
import pefile
def find_entry_point_section(pe, eop_rva):
for section in pe.sections:
if section.contains_rva(eop_rva):
return section
return None
def main(file_path):
print("Opening {}".format(file_path))
try:
pe = pefile.PE(file_path, fast_load=True)
# AddressOfEntryPoint if guaranteed to be the first byte executed.
eop = pe.OPTIONAL_HEADER.AddressOfEntryPoint
code_section = find_entry_point_section(pe, eop)
if not code_section:
return
print("[+] Code section found at offset: "
"{:#x} [size: {:#x}]".format(code_section.PointerToRawData,
code_section.SizeOfRawData))
# get first 10 bytes at entry point and dump them
code_at_oep = code_section.get_data(eop, 10)
print("[*] Code at EOP:\n{}".
format(" ".join("{:02x}".format(ord(c)) for c in code_at_oep)))
except pefile.PEFormatError as pe_err:
print("[-] error while parsing file {}:\n\t{}".format(file_path,
pe_err))
if __name__ == '__main__':
if len(sys.argv) < 2:
print("[*] {} <PE_Filename>".format(sys.argv[0]))
else:
file_path = sys.argv[1]
if os.path.isfile(file_path):
main(file_path)
else:
print("[-] {} is not a file".format(file_path))
Simply pass the name of your PE file as the first argument.
In the above code the code_at_oep
variable holds the first few bytes of the entry point. From there you can pass this bytes to the capstone engine.
Note that these first bytes might simply be a jmp
or call
instruction, so you'll have to follow the code execution in order to disassemble correctly. Disassembling correctly a program is still an open problem in computer science...