1

For my university, final-year dissertation, I am going to implement a compiler for a skeletal form of the C programming language, then go about extending it until it resembles something a little more like Java with array bounds checking, type-checking and so forth.

I am relatively competent at much of the theory that relates to compiler construction, and have experience programming in MIPS assembly language, so I do understand a little of what it is to write extremely low-level code.

My main concern is that I am likely to be able to get all the way to the point where I need to produce the actual machine-code output, but then not understand enough about how machine code is executed from the perspective of the operating system running it.

So, my actual question is basically, "does anyone know the best place to read up about writing assembly to run on an intel x86-64 processor under linux?"

The main gap in my knowledge is how the machine code is actually run in practise. Is it run directly on the processor, making "syscall"s (or the x86 equivalent) when it needs services provided by the kernel, or is the assembly language somehow an encapsulated description that tells the kernel how to execute the instructions (in a manner similar to an interpreted language such as Java)?

Any help you can provide would be greatly appreciated.

Tom Busby
  • 1,319
  • 2
  • 12
  • 25
  • possible duplicate of [Advice for learning Linux x86-64 assembly & documentation](http://stackoverflow.com/questions/1575948/advice-for-learning-linux-x86-64-assembly-documentation) – bdonlan Jul 04 '11 at 23:46
  • i'll have a read of that too, thanks for your help :) – Tom Busby Jul 04 '11 at 23:52
  • you might wanna google about the ELF file format – Vinicius Kamakura Jul 04 '11 at 23:55
  • Are you trying to emit an ELF executable directly, or an object file that can be linked with other libraries? The latter option is much more powerful, because linking to glibc and other assembler, C, and C++ modules means you don't worry about syscalls at all. – Ben Voigt Jul 05 '11 at 01:06
  • _Are you trying to emit an ELF executable directly, or an object file that can be linked with other libraries?_ Truth is I'm not sure about either, after following reading suggestions from other users, I'm leaning towards having my compiler produce an ELF executable. But I don't yet know enough about the advantages/features of either to make an informed choice. I know almost nothing about executables, until now, it was enough to know that the compiler I had for a given language made one, not how they actually function – Tom Busby Jul 05 '11 at 01:50

2 Answers2

2

This document explains how you can implement a foreign function interface to interact with other code: http://www.x86-64.org/documentation/abi.pdf

whoplisp
  • 2,508
  • 16
  • 19
  • This is the external ABI used by Linux on x86_64, so it will be quite helpful to the OP. – caf Jul 05 '11 at 04:44
1

Firstly, for the machine code start here: http://www.intel.com/products/processor/manuals/

Next, I assume your question about how the machine code is run is really about how the OS loads the exe into memory and calls main()? These links may help

Linkers and loaders: http://www.linuxjournal.com/article/6463

ELF file format: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format and http://www.linuxjournal.com/article/1060

Your machine code will go into the .text section of the executable

Finally, best of luck. Your project is similar to my final year project, except I targeted the JVM and compiled a subset of Visual Basic!

James
  • 9,064
  • 3
  • 31
  • 49
  • targetting the JVM may be a much better idea actually... because then I have a standard machine language for which I can demonstrate my compiler on any system... hmm, thank you for your answer, I'll have a think. (The exact details of my project aren't yet set in stone, the current title is simply "A Compiler for C in Haskell") – Tom Busby Jul 05 '11 at 00:09