8

The way I understand the compilation process:

1) Preprocessing: All of your macros are replaced with their actual values, all comments are removed, etc. Replaces your #include statements with the literal text of the files you've included.

2) Compilation: Won't drill down too deep here, but the result is an assembly file for whatever architecture you are on.

3) Assembly: Takes the assembly file and converts it into binary instructions, i.e., machine code.

4) Linking: This is where I'm confused. At this point you have an executable. But if you actually run that executable what happens? Is the problem that you may have included *.h files, and those only contain function prototypes? So if you actually call one of the functions from those files, it won't have a definition and your program will crash?

If that's the case, what exactly does linking do, under the hood? How does it find the .c file associated with the .h that you included, and how does it inject that into your machine code? Doesn't it have to go through the whole compilation process again for that file?

Now, I've come to understand that there are two types of linking, dynamic and static. Is static when you actually recompile the source of the library for every executable you create? I don't quite understand how dynamic linking would work. So you compile one executable library that is shared by all of your processes that use it? How is that possible, exactly? Wouldn't it be outside of the address space of the processes trying to access it? Also, for dynamic linking, don't you still need to compile the library at some juncture in time? Is it just sitting there constantly in memory waiting to be used? When is it compiled?

Can you go through the above and clear up all of the misunderstandings, wrong assumptions there and substitute your correct explanation?

ordinary
  • 5,943
  • 14
  • 43
  • 60

1 Answers1

17

At this point you have an executable.

No. At this point, you have object files, which are not, in themselves, executable.

But if you actually run that executable what happens?

Something like this:

h2co3-macbook:~ h2co3$ clang -Wall -o quirk.o quirk.c -c
h2co3-macbook:~ h2co3$ chmod +x quirk.o
h2co3-macbook:~ h2co3$ ./quirk.o
-bash: ./quirk.o: Malformed Mach-o file

I told you it was not an executable.

Is the problem that you may have included *.h files, and those only contain function prototypes?

Pretty close, actually. A translation unit (.c file) is (generally) transformed to assembly/machine code that represents what it does. If it calls a function, then there will be a reference to that function in the file, but no definition.

So if you actually call one of the functions from those files, it won't have a definition and your program will crash?

As I've stated, it won't even run. Let me repeat: an object file is not executable.

what exactly does linking do, under the hood? How does it find the .c file associated with the .h that you included [...]

It doesn't. It looks for other object files generated from .c files, and eventually libraries (which are essentially just collections of other object files).

And it finds them because you tell it what to look for. Assuming you have a project which consists of two .c files which call each other's functions, this won't work:

gcc -c file1.c -o file1.o
gcc -c file2.c -o file2.o
gcc -o my_prog file1.o

It will fail with a linker error: the linker won't find the definition of the functions implemented in file2.c (and file2.o). But this will work:

gcc -c file1.c -o file1.o
gcc -c file2.c -o file2.o
gcc -o my_prog file1.o file2.o

[...] and how does it inject that into your machine code?

Object files contain stub references (usually in the form of function entry point addresses or explicit, human-readable names) to the functions they call. Then, the linker looks at each library and object file, finds the references (, throws an error if a function definition couldn't be found), then substitutes the stub references with actual "call this function" machine code instructions. (Yes, this is largely simplified, but without you asking about a specific architecture and a specific compiler/linker, it's hard to tell more precisely...)

Is static when you actually recompile the source of the library for every executable you create?

No. Static linkage means that the machine code of the object files of a library are actually copied/merged into your final executable. Dynamic linkage means that a library is loaded into memory once, then the aforementioned stub function references are resolved by the operating system when your executable is launched. No machine code from the library will be copied into your final executable. (So here, the linker in the toolchain only does part of the job.)

The following may help you to achieve enlightenment: if you statically link an executable, it will be self-contained. It will run anywhere (on a compatible architecture anyway). If you link it dynamically, it will only run on a machine if that particular machine has all the libraries installed that the program references.

So you compile one executable library that is shared by all of your processes that use it? How is that possible, exactly? Wouldn't it be outside of the address space of the processes trying to access it?

The dynamic linker/loader component of the OS takes care all of that.

Also, for dynamic linking, don't you still need to compile the library at some juncture in time?

As I've already mentioned: yes, it is already compiled. Then it is loaded at some point (typically when it's first used) into memory.

When is it compiled?

Some time before it could be used. Typically, a library is compiled, then installed to a location on your system so that the OS and the compiler/linker know about its existence, then you can start compiling (um, linking) programs that use that library. Not earlier.

  • 1
    Brilliant answer! Thanks. Of course now I have follow up questions if you will oblige: -- You said if it calls a function, there will be a reference to a function. So the object file literally just contains some sort of symbol that can be resolved upon linkage? It's not an actual machine instruction? Like it just says "printf" or something like that? Because it can't know before hand where in memory that function will be... – ordinary Oct 19 '13 at 09:37
  • -- In the dynamic linking paradigm, from your explanation it seems like the final executable will have unresolved symbols. So when you load the program into memory, does that mean that there is an intermediary step by the OS before the program is actually loaded, in which it looks at all of the unresolved symbols and gives them literal addresses for those functions? WHat happens if you call a function that doesn't exist? Where is the point of failure? – ordinary Oct 19 '13 at 09:38
  • 1
    @ordinary Yes, it contains stub symbols ("symbol" -- that's exactly what they are called). This is partly why these files are not actually executable. –  Oct 19 '13 at 09:39
  • 1
    @ordinary Yes, a dynamically linked executable contains unresolved symbols. These are resolved by the OS (by its dynamic linker/dynamic loader component) before the CPU actually jumps to the beginning of your `main()` function. –  Oct 19 '13 at 09:39
  • What I don't get is: If you include a .h file, say , does it only consist of function prototypes for all of the functions it provides? If so, how are you going to eventually have an object file with the definitions of those functions for the linker to resolve function calls to? – ordinary Oct 19 '13 at 09:47
  • Okay I think I figured it out: When you include a *library* like , you are actually including a bunch of already compiled object files. So, when you call printf the linker just looks through the object files included, and resolves it. When you include a file like "my_header.h", you are merely copying the text into your file via preprocessing. Then you would have also compiled the corresponding "my_header.cpp" file, so upon linking the linker will find resolve the functions by looking at "my_header.o" – ordinary Oct 19 '13 at 10:08
  • @ordinary No, not at all. There are only function declarations/prototypes. The definition of those functions is in the implementation of the C standard library, for example in `/usr/lib/libc.so` on Linux or `/usr/lib/libSystem.dylib` on Mac OS X. Using angle brackets versus double quotes has absolutely nothing to do with linkage. Standard library header files are just normal header files. I suggest you read up on how the preprocessor works. –  Oct 19 '13 at 10:55
  • I don't understand. If we aren't explicitly compiling then where is the object? how is it resolving those function calls to their definitions? Are those library files dynamically linked when by the OS when you first execute the file? – ordinary Oct 19 '13 at 11:09
  • 1
    @ordinary It's precompiled. It's in the standard library implementation, which generally comes with the OS or with a compiler. –  Oct 19 '13 at 12:10