28

All texts on how to create a compiler stop after explaining lexers and parsers. They don't explain how to create the machine code. I want to understand the end-to-end process.

Currently what I understand is that, the Windows exe file formats are called Portable Executable. I read about the headers it has and am yet to find a resource which explains this easily.

My next issue is, I don't see any resource which explains how machine code is stored in the file. Is it like 32-bit fixed length instructions stored one after another in the .text section?

Is there any place which at least explains how to create an exe file which does nothing (it has a No Op instruction). My next step then would be linking to dll files to print to console.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
AppleGrew
  • 9,302
  • 24
  • 80
  • 124
  • 2
    Note that different systems have different representations for executable files. – Keith Thompson Oct 31 '11 at 18:28
  • He mentioned Windows... I think that's what he's referring to, specifically (The EXE format). – qJake Oct 31 '11 at 18:48
  • Yes, I would like to focus on Windows first. When I am comfortable with this, I can move on to ELF. – AppleGrew Oct 31 '11 at 18:52
  • This is not answer-worthy, but Microsoft implements a version of the COFF format, with a description here: http://msdn.microsoft.com/en-us/windows/hardware/gg463119 – wkl Oct 31 '11 at 19:50
  • possible duplicate of [How to read / write .exe machine code manually?](http://stackoverflow.com/questions/756367/how-to-read-write-exe-machine-code-manually) – Ciro Santilli OurBigBook.com May 23 '15 at 13:35

7 Answers7

8

Nice question! I don't have much expertise on this specific question, but this is how I would start:

  1. PE or ELF does not create pure machine code. It also contains some header info etc. Read more: Writing custom data to executable files in Windows and Linux

  2. I assume you are looking for how does ELF/PE file hold the machine code, you can get that from this question (using objdump): How do you extract only contents of an ELF section

  3. Now, if you want to know how the content part is generated in the first place, i.e. how is the machine code generated, then that's the task of the compiler's code generation.

  4. Try out some resource editor like ResourceEditor to understand the exe or simply ildasm.

PS: These are mostly Unix solutions, but I am sure, PE should be doing something fundamentally similar.

I think the best way to approach it will be first try to analyze how existing PE/ELFs work, basically reverse engineering. And to do that, Unix machine will be a good point to start. And then do your magic :)

Not same but a similar question here.

Update:

I generated an object dump out of a sample c code. Now, I assume that's what you are targeting right? You need to know do you generate this file (a.out)?

https://gist.github.com/1329947

Take a look at this image, a life time of a c code.

enter image description here

Source Now, just to be clear, you are looking to implement the final step, i.e. conversion of object code to executable code?

Community
  • 1
  • 1
zengr
  • 38,346
  • 37
  • 130
  • 192
  • Your links are helpful. One thing missing is the code generation part. What exactly do you mean by that they don't have pure machine code? – AppleGrew Nov 01 '11 at 03:50
  • One more note. I use 7zip to extract the different sections from exe or dll. This is very simple. – AppleGrew Nov 01 '11 at 03:56
  • 1. When you say, the code generation part, you mean how to create the ELF file? 2. Well, pure machine code is not readable code. But, an ELF file has some "metadata" attached with it. I will update my answer then maybe we can read to an answer. – zengr Nov 01 '11 at 04:33
  • Yes I want to understand the final step. I am pretty clear what ELF and PE means. By code generation I mean just the machine code. The documents on PE doesn't throw any light on that. – AppleGrew Nov 01 '11 at 09:15
  • A friend suggested - http://inst.eecs.berkeley.edu/~cs164/fa11/. This looks pretty good. – AppleGrew Nov 01 '11 at 09:15
4

As in many of his articles, I'd say Matt Pietrek's piece about PE internals remains the best introdction to the matter more than a decade after being written.

Ofek Shilon
  • 14,734
  • 5
  • 67
  • 101
  • No longer available. [This link](https://msdn.microsoft.com/en-us/magazine/ms809762.aspx) on [his Wikipedia entry](https://en.wikipedia.org/wiki/Matt_Pietrek) still works, but is two decades old... – Andreas Haferburg Oct 27 '17 at 09:36
2

Iv'e used "Wotsit's File Format" for years... all the way back to the days of MS-Dos :-) and back to when it was just a collection of text files you could download from most BBS systems called "The Game programmers file type encyclopaedia"

It's now owned by the people that run Gamedev.Net, and probably one of the best kept secrets on the internet.

You'll find the EXE format on this page : http://www.wotsit.org/list.asp?fc=5

Enjoy.

UPDATE June 2020 - The link above seems to be now dead, I've found the "EXE" page listed on this web archive page of the wotsit site: https://web.archive.org/web/20121019145432/http://www.wotsit.org/list.asp?al=E

UPDATE 2 - I'm keeping the edit as it was when I added the update erlier, thanks to those who wanted to edit it, but it's for a good reason I'm rejecting it:

1) Wotsit.org may at some point in the future come back online, if you actually try visiting the url, you'll find that it's not gone, it does still respond, it just responds with an error message. This tells me that someone is keeping the domain alive for whatever reason.

2) The archive links do seem to be a bit jittery, some work, some don't, sometimes they seem to work, then after a refresh they don't work, then they do work again. I remember from experience when wotsit was still online, they they had some very strange download/linking detection code in, and this probably caused archive.org to get some very wierd results, I do remember them taking this stance because of the huge number of 3rd party sites trying to cash in on their success, by pretending to be affiliate's and then direct linking to wotsit from an ad infested site.

Until the wotsit domain is removed entirely from the internet and not even the DNS responds, then would be the time to wrap everything up into single archive links, until then, this is the best way to maintain the link.

shawty
  • 5,729
  • 2
  • 37
  • 71
  • Just tried myself, and yes your right. First time I've ever seen any problems with the site. Unfortunately, nothing I can help with, I don't run the site, I guess you'll need to take a look and see if there is any help/admin link to contact the site owners. As I said it's run by the people that run "Gamedev.Net" so maybe worth going there, and asking around. – shawty Nov 01 '11 at 14:43
  • You can still find it on [web.archive.org](https://web.archive.org/web/20150611045507/http://www.wotsit.org/list.asp?page=2&fc=5&search=&al=), but the downloads no longer work. A web search for "Bernd Luevelsmeyer pe file format" might work better. – Andreas Haferburg Oct 27 '17 at 09:31
  • Crying shame that it's not still live though :-( Over the years I contributed some of those format documents to the project myself. Up in my loft somewhere I actually still have a whole bunch of the docs printed out on a very old dot matrix printer and stored in a big ring-binder. – shawty Oct 27 '17 at 13:09
  • @user3789797 don't you think that would be better served as an answer in it's own right, rather than a comment on my answer, you are after all answering the question directly as opposed to adding anything extra to the comments on mine. – shawty Jun 13 '21 at 13:48
  • nope, afraid not. It is about the same subject that much is true, but the docs I was referring to where the original "wottsists file format docs", not the tiny PE project, hence as I say, your better off as a stand alone answer. – shawty Jun 14 '21 at 10:31
1

Not surprisingly the best sites for information about writing PE format files are all about creating viruses.

A search of VX Heavens for "PE" gives a whole bunch of tutorials for modifying PE files

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263
  • I am unable to find anything useful on VX Heavens. It has some links to, I guess Russian sites. – AppleGrew Nov 01 '11 at 03:34
  • -EDIT- Found out that I need to search from the box to get to the links. The more direct link could be http://forum.vxheavens.com/viewtopic.php?id=186 – AppleGrew Nov 01 '11 at 03:40
1

Some information about making PE files as small as possible: Tiny PE.

The minimalistic way to mess around with code generation, if you're just looking to try a few simple things out, is to output MS-DOS .COM files, which have no header or metadata. Sadly, you'd be restricted to 16-bit code. This format is still somewhat popular for demos.

As for the instruction format, from what I recall the x86 instruction set is variable-length, including 1-byte instructions. RISC CPUs would probably have fixed-length instructions.

Vlad
  • 18,195
  • 4
  • 41
  • 71
0

Executable file format is dependent on the OS. For windows it is PE32(32 bit) or PE32+(64 bit).

The way the final executable look like depends on the ABI (application binary interface) of the OS. The ABI tells how the OS loader should load the exe and how it should relocate it, whether it is dll or plain executable etc..

Every object file(executable or dll or driver) contains a part called sections. This is where all of our code, data, jump tables etc.. are situated.

Now, to create an object file, which is what a compiler does, you should not just create the executable machine code, but also the headers, symbol table, relocation records, import/export tables etc..

The pure machine code generation part is completely dependent on how much optimized you want your code to be. But to actually run the code in the PC, you must have to create a file with all of the headers and related data(check MSDN for precise PE32+ format) and then put all of the executable machine code(which your compiler generated) into one of the sections(usually code resides in section called .text). If you have created the file conforming to the PE32+ format, then you have now successfully created an executable in windows.

0

For Linux, one may read and run the examples from "Programming from the Ground Up" by Jonathan Bartlett:

http://www.cs.princeton.edu/courses/archive/spr08/cos217/reading/ProgrammingGroundUp-1-0-lettersize.pdf

Then of course one may prefer to hack Windows programs. But perhaps the former gives a better way to understand what really goes on.

John Donn
  • 1,718
  • 2
  • 19
  • 45
  • I don't see anything related to my question. – AppleGrew Nov 01 '11 at 03:43
  • from your question: "All texts on how to create a compiler stop after explaining lexers and parsers. They don't explain how to create the machine code. I want to understand the end-to-end process". The book cited (see chapter 3 for "Hello World"-like program) explains how to write assembly programs, compile it into machine language, and then link it creating an executable file. – John Donn Nov 01 '11 at 21:41