Why should I include an header file? And how #include actually works?

Question

At first, I was writing my function in an .h file and then include it with #include "myheader.h". Then, someone said me that it's better to add to these files only functions prototypes and to put the real code in a separate .c file. Now, I'm able to compile more .c files to generate only an executable, but at this point I can't understand why I should add the header files if the code is in another file.

Moreover, I had a look to standard C libraries (like stdlib.h) in the system and it seemed to me to store only structure definitions, constants and similar... I'm not so good with C (and to be honest, stdlib.h was almost Chinese to me, no offense for Chinese of course :) ), but I didn't spot any single line of 'operative' code. However, I always just include it without adding anything else and I compile my files as if the 'code' was actually there.

Can someone please explain me how do these things work? Or, at least, point me to a good guide? I searched on Google and SO, also, but didn't find anything that explains it clearly.

The header files (should) tell the compiler about available functions (that are defined in (other) C source files or libraries). Having this said, there are quire a few libraries that misuse the header files to add code as well for all kind of (obscure) reasons. — Veger, Jun 22 '12 at 13:28
Declarations in .h files. Definitions in .c files. Include all needed .h files to the appropriate .c files. Compile all .c files into single executable. Done. — jn1kk, Jun 22 '12 at 13:29
The reason for the separation of "headers" and "operative code" is the concept of **separate-compilation**. Simple explanation of the concept : http://www.cs.bu.edu/teaching/c/separate-compilation/#sep-compile — wap26, Jun 22 '12 at 13:35
The ["why headers?" part of the questions has been asked many times before](http://stackoverflow.com/q/2184646/2509)... — dmckee --- ex-moderator kitten, Jun 22 '12 at 19:17

MrTJ · Accepted Answer · 2012-10-11T15:55:10.167

When you compile C code, the compiler has to actually know that a particular function exists with a defined name, parameter list, return type and optional modifiers. All these things are called function signature and the existence of a particular function is declared in the header file. Having this information, when the compiler finds a call to this function, it will know which type of parameters to look for, can control whether they have the appropriate type and prepare them to a structure that will be pushed to the stack before the code actually jumps to your function implementation. However the compiler does not have to know the actual implementation of the function, it simple puts a "placeholder" in your object file to all function calls. (Note: each c files compiles to exactly one object file). #include simple takes the header file and replaces the #include line with the contents of the file.

After the compilation the build script passes all object files to the linker. The linker will be that resolves all function "placeholders" finding the physical location of the function implementation, let them be among your object files, framework libraries or dlls. It simple places the information where the function implementation can be found to all function calls thus your program will know where to continue execution when it arrives to your function call.

Having said all this it should be clear why you can't put function definition in the header files. If later you would #include this header in to more then one c file, both of them would compile the function implementation into two separate object files. The compiler would work well, but when the linker wanted to link together everything, it would find two implementation of the function and would give you an error.

stdlib.h and friends work the same way. The implementation of the functions declared in them can be found in framework libraries which the compiler links to your code "automatically" even if you are not aware of it.

Vlad · Answer 2 · 2012-06-22T15:09:06.317

3

The C language (together with C++) uses a quite obsolete strategy for making the compiler know the functions defined elsewhere.

This strategy goes like this: the signatures of the functions etc. (this stuff is called declarations in C) go into a special file called header, and every other file which wants to use them is expected to almost literally include that header into the file (actually, #include directive just tells the compiler to include the literal text of the header), so that the compiler sees again the function declarations.

Other languages solve this problem in a different way: compiler sees all the source code, and remembers the metadata of the already compiled classes itself.

The strategy used in C shifts the task of finding all the dependencies from the compiler to the developer; it's a legacy from the old times when the computers were small, silly and slow, so this kind of help from the developer was really valuable.

Although this strategy has numerous drawbacks, and besides it's theoretically possible to change it now, the standard is not going to change, because gigabytes of code have been written in that style already.

tl;dr: it's a legacy from the 70's.

edited Jun 22 '12 at 15:09

answered Jun 22 '12 at 13:32

Vlad

35,022
6
77
199

And there isn't any need for the header to know where the actual source code is? How does including standard libraries work? – Zagorax Jun 22 '12 at 13:39
1

@Zagorax: you don't "include" libraries, you link with them by telling the linker where they live. – Fred Foo Jun 22 '12 at 13:40
It depends on the compiler. In Unix, there is standard place where the standard headers are located. In windows/Visual Studio, it's regulated by the Visual Studio GUI settings. The actual code doesn't matter until _link_ time, as @larsmans correctly mentioned. – Vlad Jun 22 '12 at 13:41
@larsmans, maybe this is the point where I'm confused. Where am I telling the linker where the code is? With `#include`? With `#include` I'm saying where the header is, not the c code that really executes, for example, a malloc. – Zagorax Jun 22 '12 at 13:42
While the C strategy could be considered legacy (and has warts), it does still have upsides. Many libraries on Linux are available in optimized and debugging versions, sharing a single set of headers. – Fred Foo Jun 22 '12 at 13:44
@Zagorax: you do that in your build script, not in the program. – Fred Foo Jun 22 '12 at 13:44
@Zagorax: again, it's compiler-dependent. In gcc, you use `-l` compiler switch; in Visual Studio, you list the libraries in project options, etc. – Vlad Jun 22 '12 at 13:44
@larsmans: in .NET, you can compile against a debug version of an assembly, and replace it later with a release version -- to the same effect. (Provided that both versions are using the same interface.) – Vlad Jun 22 '12 at 13:46
@Vlad: without recompiling? (I'm not familiar with C#/.NET at all.) – Fred Foo Jun 22 '12 at 13:53
@larsmans: yes, if the signatures of the used functions are the same. The binding is done at load time (with reflection) and fails with exception if the signatures don't match. Not exactly sure about if it's class or assembly load time. (Assembly is a peculiar name of shared module in .NET.) – Vlad Jun 22 '12 at 14:00
@Vlad, first sentence of the second paragraph should say "the declarations" not "definitions". Also, this strategy does allow swapping of implementation and for library writers to write nice library implementations (a.k.a. definitions) and for the end user to just link to them. – Josh Jun 22 '12 at 14:58
@Josh: you're right about the declarations, I've corrected it. The possibility to swap the implementations does exist with other strategies as well, look up the discussion with larsmans in the comments above. – Vlad Jun 22 '12 at 15:10

Lukas Schmelzeisen · Answer 3 · 2012-06-22T13:49:03.203

In C it is required that a function is declared before it is called. The reason this is required was that in the 70s it would just take too much time to first parse a file for all its symbols and then parse it a second time to actually compile the code. If all functions are declared before they are called one single parse is enough. However on modern system we no longer face those limitations and that is the reason why mondern languages don't have this requirement.

Imagine you have 2 files a.c b.c in your project. You implement a function foo which you want to use in both files. You can't just define the function in a.c and use it in b.c since you have to declare a function before you call it. So you would add a line void foo(); to b.c. But everytime you change the signature of your function in a.c you would have to change the declaration in b.c. To circumvent this issue it is standard strategy in C to declare all functions your file implements in a seperate header file (in this case a.h. The header file is then included by all other files who want to use that code (so b.c would use this: #include "a.h").

Matt Coughlin · Answer 4 · 2012-06-22T14:16:47.910

An #include is a preprocessor directive that causes the file to be textually inserted at the point where the #include occurs.

When linking multiple .c files that include the same header files, care must be taken to avoid multiple inclusions of the header files (textually inserting a header file more than once). The #ifndef, #define, and #endif preprocessor directives can be used to prevent multiple inclusions.

#ifndef MY_FILE_H
#define MY_FILE_H

/* This code will not be included more than once. */

#endif /* !MY_FILE_H */

score 0 · Answer 5 · answered Jun 22 '12 at 15:36

I can't understand why I should add the header files if the code is in another file.

The header file contains the declarations for functions defined in the other file, which is necessary for the code that's calling the function to compile correctly.

For instance, suppose I write the following code:

int main(void)
{
  double *foo = malloc(sizeof *foo * 10);
  if (foo)
  {
    // do something with foo
    free (foo);
  }
  return 0;
}

malloc is a standard library function that dynamically allocates memory and returns a pointer to it. The return type of malloc is void *, any value of which can be assigned to any other pointer type. free is another standard library function that deallocates memory allocated through malloc, and its return type is void (IOW, no return value).

However, the compiler doesn't automatically know what malloc or free return (or don't return); it needs to see the declarations for both functions in the current scope before it can correctly translate the function calls. Under the C89 standard and earlier, if a function is called without a declaration in scope, the compiler assumes that the function returns int; since int is not compatible with double * (you can't assign one to the other directly without a cast), you'll get an "incompatible assignment" diagnostic. Under C99 and later, implicit declarations aren't allowed at all. Either way the compiler won't translate the code.

I need to add the line

#include <stdlib.h>

which includes the declarations for malloc and free (and a bunch of other stuff) to the beginning of the file.

There are several reasons you don't want to put function definitions (or variable definitions) in header files. Suppose you define function foo in header a.h. You include a.h in files a.c and b.c. Each file will compile okay individually, but when you try to link them together to build a library or executable, you'll get a "multiple definition" error from the linker -- you've wound up creating two separate instances of a function with the same name, which is a no-no. Same goes for variable definitions.

It also doesn't scale well. If you put a bunch of functions in their own header files and include them in one source file, you're translating all those functions in one big glob. Furthermore, if you only change the code in the source file or one header file, you still wind up recompiling everything each time you recompile the .c file. By putting each function in it's own .c file, you can reduce your overall build times by only recompiling the files that need to be recompiled.

Why should I include an header file? And how #include actually works?

5 Answers5