6
int a;
int a=3; //error as cpp compiled with clang++-7 compiler but not as C compiled with clang-7;

int main() {

}

For C, the compiler seems to merge these symbols into one global symbol but for C++ it is an error.

Demo


file1:

int a = 2;

file2:

#include<stdio.h>
int a; 

int main() {
    printf("%d", a); //2
}

As C files compiled with clang-7, the linker does not produce an error and I assume it converts the uninitialised global symbol 'a' to an extern symbol (treating it as if it were compiled as an extern declaration). As C++ files compiled with clang++-7, the linker produces a multiple definition error.


Update: the linked question does answer the first example in my question, specifically 'In C, If an actual external definition is found earlier or later in the same translation unit, then the tentative definition just acts as a declaration.' and 'C++ does not have “tentative definitions”'.

As for the second scenario, if I printf a, then it does print 2, so obviously the linker has linked it correctly (but I previously would have assumed that a tentative definition would be initialised to 0 by the compiler as a global definition and would cause a link error).

It turns out that int i[]; tentative defintion in both files also gets linked to one definition. int i[5]; is also a tentative definition in .common, just with a different size expressed to the assembler. The former is known as a tentative definition with an incomplete type, whereas the latter is a tentative definition with a complete type.

What happens with the C compiler is that int a is made strong-bound weak global in .common and left uninitialised (where .common implies a weak global) in the symbol table (whereas extern int a would be an extern symbol), and the linker makes the necessary decision, i.e. it ignores all weak-bound globals defined using #pragma weak if there is a strong-bound global with the same identifier in a translation unit, where 2 strong-bounds would be a multiple definition error (but if it finds no strong-bounds and 1 weak-bound, the output is a single weak-bound, and if it finds no strong-bounds but two weak-bounds, it chooses the definition in the first file on the command line and outputs the single weak-bound. Though two weak-bounds are two definitions to the linker (because they are initialised to 0 by the compiler), it is not a multiple definition error, because they are both weak-bound) and then resolves all .common symbols to point to the strong/weak-bound strong global. https://godbolt.org/z/Xu_8tY https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter2-93321/index.html As baz is declared with #pragma weak, it is weak-bound and gets zeroed by the compiler and put in .bss (even though it is a weak global, it doesn't go in .common, because it is weak-bound; all weak-bound variables go in .bss if uninitialised and get initialised by the compiler, or .data if they are initialised). If it were not declared with #pragma weak, baz would go in common and the linker will zero it if no weak/strong-bound strong global symbol is found.

C++ compiler makes int a a strong-bound strong global in .bss and initialises it to 0: https://godbolt.org/z/aGT2-o, therefore the linker treats it as a multiple definition.


Update 2:

GCC 10.1 defaults to -fno-common. As a result, global variable targets are more efficient on various targets. In C, global variables with multiple tentative definitions now result in linker errors (like C++). With -fcommon such definitions are silently merged during linking.

Lewis Kelsey
  • 4,129
  • 1
  • 32
  • 42
  • 10
    different languages are different. Of course it can be interesting to understand why this difference exist, but it shouldnt be a big surprise that such difference exists – 463035818_is_not_an_ai Mar 30 '20 at 13:42
  • 15
    C and C++ are different languages. You should get out of the habit of expecting what works in one works in the other. – NathanOliver Mar 30 '20 at 13:42
  • 2
    Your `int a` should also in C give multiple definitions. After all, compilation unit file1 and compilation unit file2 will both have a symbol int `a` in their object – Paul Ogilvie Mar 30 '20 at 13:46
  • 8
    https://stackoverflow.com/questions/3095861/about-tentative-definition is at least part of your answer – Mat Mar 30 '20 at 13:48
  • 1
    [Here](https://stackoverflow.com/questions/15734699/in-c-why-is-multiple-declarations-working-fine-for-a-global-variable-but-not-for) you can find some explanation, including a case of local variables. – Alex Sveshnikov Mar 30 '20 at 13:48
  • 2
    To the captains obvious above: I think expected answer is quote of both standards explaining that difference. – Marek R Mar 30 '20 at 13:49
  • 1
    @PaulOgilvie in the first snippet isn't `int a` called a "tentative" definition which acts as declaration because `a` is later defined? – Weather Vane Mar 30 '20 at 13:49
  • @WeatherVane, What about the objects that will have a symbol for `a`? – Paul Ogilvie Mar 30 '20 at 13:50
  • 1
    Does this answer your question? [GCC does not complain about re-definition of external variable?](https://stackoverflow.com/questions/47288731/gcc-does-not-complain-about-re-definition-of-external-variable) – Mikel Rychliski Mar 30 '20 at 13:52
  • 2
    @NathanOliver: The question does not assert OP expects this thing that works in C to work in C++. The question asks why there is a difference. That is a perfectly reasonable question to ask. – Eric Postpischil Mar 30 '20 at 14:00
  • 3
    @PaulOgilvie: Re “Your int a should also in C give multiple definitions”: There is no such requirement in the C standard. The effective multiple definitions violate the “shall” in C 2018 6.9 5, and the consequence of this violation is that the C standard **does not impose any requirements on the behavior**. In particular, it does not impose a requirement that there be a diagnostic message. The lack of requirements means implementations are free to choose what behavior they want. Unix tools have generally chosen to treat allow regular definitions to rule over tentative definitions. – Eric Postpischil Mar 30 '20 at 14:04
  • So the object of file1 will have a definition for `a` and its initializer and file2's object will have a reference to some `a`. Upon linking, file1's definition and initializer of `a` is placed in the .data segment of initialized variables and file2's object will reference this `a`. Should only file2's object be linked, then an `a` in the .bss of zero-initialized variables is created to which `a` will refer. This places more intelligence onto the linker (but is nice). – Paul Ogilvie Mar 30 '20 at 14:04
  • Being explicit about this is the whole reason for the `extern` keyword in C. It is required for variables and optional for function declarations. – Zan Lynx Mar 31 '20 at 15:30

1 Answers1

4

I'll address the C end of the question, since I'm more familiar with that language and you seem to already be pretty clear on why the C++ side works as it does. Someone else is welcome to add a detailed C++ answer.

As you noted, in your first example, C treats the line int a; as a tentative definition (see 6.9.2 in N2176). The later int a = 3; is a declaration with an initializer, so it is an external definition. As such, the earlier tentative definition int a; is treated as merely a declaration. So, retroactively, you have first declared a variable at file scope and later defined it (with an initializer). No problem.

In your second example, file2 also has a tentative definition of a. There is no external definition in this translation unit, so

the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0. [6.9.2 (1)]

That is, it is as if you had written int a = 0; in file2. Now you have two external definitions of a in your program, one in file1 and another in file2. This violates 6.9 (5):

If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.

So under the C standard, the behavior of your program is undefined and the compiler is free to do as it likes. (But note that no diagnostic is required.) With your particular implementation, instead of summoning nasal demons, what your compiler chooses to do is what you described: use the common feature of your object file format, and have the linker merge the definitions into one. Although not required by the standard, this behavior is traditional at least on Unix, and is mentioned by the standard as a "common extension" (no pun intended) in J.5.11.

This feature is quite convenient, in my opinion, but since it's only possible if your object file format supports it, we couldn't really expect the C standard authors to mandate it.

clang doesn't document this behavior very clearly, as far as I can see, but gcc, which has the same behavior, describes it under the -fcommon option. On either compiler, you can disable it with -fno-common, and then your program should fail to link with a multiple definition error.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82