3

As I understand it the GCC compiler performs four steps when I compile a C program.

  1. Preprocessing - C code (*.c) with macros to C code without macros (*.c)
  2. Compiling - C code (*.c) to Assembly language (*.s)
  3. Assembling - Assembly language (*.s) to Object code (*.o)
  4. Linking - Object code (*.o) to executable (*)

The first three steps make perfect sense to me, but I am still confused as to what linking actually does.

After step three why can't I run the *.o file? At that point my C code is now in object/machine/byte code and can be interpreted by the CPU directly. Yet when I make my *.o file executable and try to run it I get this error:

bash: ./helloworld.o: cannot execute binary file: Exec format error

Why do I get this error? If I have a tiny C program (for example a hello world program) with only one C file it would appear to me that linking has no purpose because there's nothing to link. So what does linking in the compilation process actually do?

Thanks in advance for any replies.

  • 3
    how about the `printf()` you use in helloworld program? Where does that come from? – Sourav Ghosh Jun 13 '16 at 11:08
  • @SouravGhosh Very true, I hadn't thought about that. So linking not only links my object files together but other object files (such as standard libraries) as well. Thank you. –  Jun 13 '16 at 11:36

3 Answers3

2

If I have a tiny C program (for example a hello world program)

Even your helloworld program does use #inlude<stdio.h>, doesn't it? That means you're using a library, and the linking step is there to combine the necessary object code (here the library code) to create a binary for you.


For a detailed descriptions of what the linking step does (and compare with compiling) - see this question

Community
  • 1
  • 1
artm
  • 17,291
  • 6
  • 38
  • 54
  • This doesn't answer the question. OP could create completely empty `main` program, and he still couldn't run object files. – user694733 Jun 13 '16 at 11:08
  • 1
    Even so, there will be lots of other stuff that we will need to link in to get from the loader entry point (often _start) to your main function. – doron Jun 13 '16 at 11:18
  • @doron Thank you, could you give me some examples of those other things please? –  Jun 13 '16 at 11:37
2

Linking in rough explanation is:

  • Find all the matching segments from each object file, and concat them together. This way we end up with one large .code, one .data, one .bss etc.
  • Resolve all symbols that are used. Many symbols are local, so that they can be resolved immediately. Unresolved symbols will be searched for in the libraries requested to link with. When this is done, the result will be a symbol table / link map.
  • Make an file that is actually executable. On Linux, it usually just happens that both executable, libraries and object files all are in the ELF format. This is not true for all platforms.
Stian Skjelstad
  • 2,277
  • 1
  • 9
  • 19
  • Thank you for your answer, although I have marked @artm answer as accepted yours was very helpful as well. –  Jun 13 '16 at 11:46
1

The simple answer is that .o executables serve different purposes and have a different format.

If you want the complete answer you will need to read the necessary documentation for your platforms binary format.

On linux this will be here. This document will describe the difference between the intermediate format and the final executable format.

Just as an aside the linux kernel module loader does use .o (or rather .ko) files directly.

doron
  • 27,972
  • 12
  • 65
  • 103