Should there be a linker error?
The short answer to "Shouldn't there at least be linking errors?" is "There is no guarantee that there'll be a linking error". The C standard doesn't mandate it.
As Raymond Chen noted in a comment:
The language-lawyer answer is that the standard does not require a diagnostic for this error. The practical answer is that C does not type-decorate symbols with external linkage, so the type mismatch goes undetected.
One of the reasons C++ has type-safe linkage is to avoid problems with code analogous to this (though the main reason is to allow for function name overloading — resolving this sort of problem is, perhaps, more a side-effect).
The C standard says:
§6.9 External definitions
¶5 An external definition is an external declaration that is also a definition of a function
(other than an inline definition) or an object. If an identifier declared with external
linkage is used in an expression (other than as part of the operand of a sizeof
or
_Alignof
operator whose result is an integer constant), somewhere in the entire
program there shall be exactly one external definition for the identifier; otherwise, there
shall be no more than one.
§5.1.1.1 Program structure
¶1 A C program need not all be translated at the same time. The text of the program is kept in units called source files, (or preprocessing files) in this International Standard. A source file together with all the headers and source files included via the preprocessing directive #include
is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit. Previously translated translation units may be preserved individually or in libraries. The separate translation units of a program communicate by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units may be separately translated and then later linked to produce an executable program.
5.1.1.2 Translation phases
- All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment.
The linking is done based on the names of external definitions, not on the types of the objects identified by the name. The onus is on the programmer to ensure that the type of the function or object for each external definition is consistent with the way it is used.
Avoiding the problem
In a comment, I said:
This [question] is an argument for making use of headers to ensure that different parts of a program are coherent. If you never declare an external function in a source file but only in headers, and use the headers wherever the relevant symbol (in this case weird
) is used or defined, then the code would not all compile. You could either have a function or a string, but not both. You'd have a header weird.h
which contains either extern char *weird;
or extern int weird(int *p);
(but not both), and both main.c
and weird.c
would include the header, and only one of them would compile successfully.
To which there came the response:
What could I add to these files to ensure that the error is detected and thrown when main.c
is compiled?
You'd create 3 source files. The code shown here is slightly more complicated than you'd normally use because it allows you to use conditional compilation to compile the code with either a function or a variable as the 'external identifier with external linkage' called weird
. Normally, you'd select one intended representation for weird
and only allow that to be exposed.
weird.h
#ifndef WEIRD_H_INCLUDED
#define WEIRD_H_INCLUDED
#ifdef USE_WEIRD_STRING
extern const char *weird;
#else
extern int weird(int *p);
#endif
#endif /* WEIRD_H_INCLUDED */
main.c
#include <stdio.h>
#include "weird.h"
int main(void)
{
int x, *y;
y = (int *)7;
x = weird(y);
printf("x = %d\n", x);
return (0);
}
weird.c
#include "weird.h"
#ifdef USE_WEIRD_STRING
const char *weird = "weird";
#else
int weird(int *p)
{
if (p == 0)
return 42;
else
return 99;
}
#endif
Valid compilation sequences
gcc -c weird.c
gcc -c main.c
gcc -o program weird.o main.o
gcc -o program -DUSE_WEIRD_FUNCTION main.c weird.c
Both these work because the code is compiled to use the weird()
function. The header, in both cases, ensures that the compilations are consistent.
Invalid compilation sequence
gcc -c -DUSE_WEIRD_STRING weird.c
gcc -c main.c
gcc -o program weird.o main.o
This is basically the same as the setup in the question. The weird.c
file is compiled to create a string called weird
, but the main.c
code is compiled expecting to use a function weird()
. The linker does link the code, but things go disastrously wrong when the function call in main()
is retargeted to the "weird"
. The chances are that the memory where it is stored is not executable and the execution fails because of that. Otherwise, the string is interpreted as machine code and it probably doesn't do anything meaningful and leads to a crash. Neither is desirable; neither is guaranteed — this is a result of invoking undefined behaviour.
If you tried to compile main.c
with -DUSE_WEIRD_STRING
, the compilation would fail because the header would indicate that weird
is a char *
and the code would try to use it as a function.
If you replaced the conditional code in weird.c
with either the string or the function (unconditionally), then:
- Either the compilation would fail if the file contained the function but
-DUSE_WEIRD_STRING
was set on the command line,
- Or the compilation would fail if the file contained the string but you did not set
-DUSE_WEIRD_STRING
.
Normally, the header would contain an unconditional declaration for weird
, either as a function or as a pointer (but without any provision for choosing between them at compile time).
The key point is that the header is included in both source files, so unless the conditional compilation flags make a difference, the compiler can check the code in the source files for consistency with the header, and therefore the two object files stand a chance of working together. If you subvert the checking by setting the compilation flags so that the two source files see different declarations in the header, then you're back to square one.
The header, therefore, declares the interfaces, and the source files are checked to ensure that they adhere to the interface. The headers are the glue that hold the system together. Consequently, any function (or variable) that must be accessed outside its source file should be declared in a header (one header only), and that header should be used in the source file where the function (or variable) is defined, and also in every source file that references the function (or variable). You should not write extern … weird …;
in a source file; such declarations belong in a header. All functions (or variables) that are not referenced outside the source file where they're defined should be defined with static
. This gives you the maximum chance of spotting problems before you run the program.
You can use GCC to help you. For functions, you can insist on prototypes being in scope before a (non-static
) function is referenced or defined (and before a static
function is referenced — you can simply define a static
function before it is referenced without a separate prototype). I use:
gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
-Wold-style-definition -Wold-style-declaration …
The -Wall
and -Wextra
imply some, but not all, of the other -W…
options, so that isn't a minimal set. And not all versions of GCC support both the -Wold-style-…
options. But together, these options ensure that functions have a full prototype declaration before the function is used.