8

It seems uninitialized global variable is treated as weak symbol in Gcc. What is the reason behind this?

ergosys
  • 47,835
  • 5
  • 49
  • 70
Thomson
  • 20,586
  • 28
  • 90
  • 134
  • 3
    How do you come to that conclusion, what did you try, what are the facts? – Jens Gustedt Sep 11 '10 at 17:39
  • 1
    C or C++? There's a big difference. – CB Bailey Sep 11 '10 at 18:09
  • 5
    @AndreyT: In C, `int n;` at file scope is a tentative definition and it's a common extension (appendix J), which gcc follows, to allow multiple definitions both in a single translation unit (as in standard C) and between translation units so long as the definitions agree and at most one definition is not tentative. In C++ `int n;` is just a definition, there's no tentative concept, and it's not common to allow and merge multiple definitions in a program. The versions of gcc that I've used haven't allowed this in C++ mode, hence my questioning whether the C++ tag was really appropriate. – CB Bailey Sep 11 '10 at 23:29
  • @AndreyT: I've just read your answer. Clearly you know all this already. Why bother to ask me and elicit the long explanation? – CB Bailey Sep 11 '10 at 23:33
  • @Charles Bailey: Well, to me the concept of tentative definition appears to have only a local (restricted to one translation unit) effect and so it doesn't look like something relevant in this case. So, when it comes to inter-translation-unit issues (which this question seems to be about), I don't see any notable differences between C and C++ (aside from the "common extension" remark). The reason I asked is because I thought that maybe I'm missing something else. – AnT stands with Russia Sep 12 '10 at 00:07

4 Answers4

33

gcc, in C mode:

Uninitialised globals which are not declared extern are treated as "common" symbols, not weak symbols.

Common symbols are merged at link time so that they all refer to the same storage; if more than one object attempts to initialise such a symbol, you will get a link-time error. (If they aren't explicitly initialised anywhere, they will be placed in the BSS, i.e. initialised to 0.)

gcc, in C++ mode:

Not the same - it doesn't do the common symbols thing. "Uninitialised" globals which are not declared extern are implicitly initialised to a default value (0 for simple types, or default constructor).


In either case, a weak symbol allows an initialised symbol to be overridden by a non-weak initialised symbol of the same name at link time.


To illustrate (concentrating on the C case here), I'll use 4 variants of a main program, which are all the same except for the way that global is declared:

  1. main_init.c:

    #include <stdio.h>
    
    int global = 999;
    
    int main(void) { printf("%d\n", global); return 0; }
    
  2. main_uninit.c, which omits the initialisation:

    #include <stdio.h>
    
    int global;
    
    int main(void) { printf("%d\n", global); return 0; }
    
  3. main_uninit_extern.c, which adds the extern keyword:

    #include <stdio.h>
    
    extern int global;
    
    int main(void) { printf("%d\n", global); return 0; }
    
  4. main_init_weak.c, which initialises global and declares it to be a weak symbol:

    #include <stdio.h>
    
    int global __attribute__((weak)) = 999;
    
    int main(void) { printf("%d\n", global); return 0; }
    

and another_def.c which initialises the same global:

int global = 1234;

Using main_uninit.c on its own gives 0:

$ gcc -o test main_uninit.c && ./test
0

but when another_def.c is included as well, global is explicitly initialised and we get the expected result:

$ gcc -o test main_uninit.c another_def.c && ./test
1234

(Note that this case fails instead if you're using C++.)

If we try with both main_init.c and another.def.c instead, we have 2 initialisations of global, which won't work:

$ gcc -o test main_init.c another_def.c && ./test
/tmp/cc5DQeaz.o:(.data+0x0): multiple definition of `global'
/tmp/ccgyz6rL.o:(.data+0x0): first defined here
collect2: ld returned 1 exit status

main_uninit_extern.c on its own won't work at all - the extern keyword causes the symbol to be an ordinary external reference rather than a common symbol, so the linker complains:

$ gcc -o test main_uninit_extern.c && ./test
/tmp/ccqdYUIr.o: In function `main':
main_uninit_extern.c:(.text+0x12): undefined reference to `global'
collect2: ld returned 1 exit status

It works fine once the initialisation from another_def.c is included:

$ gcc -o test main_uninit_extern.c another_def.c && ./test
1234

Using main_init_weak.c on its own gives the value we initialised the weak symbol to (999), as there is nothing to override it:

$ gcc -o test main_init_weak.c && ./test
999

But pulling in the other definition from another_def.c does work in this case, because the strong definition there overrides the weak definition in main_init_weak.c:

$ gcc -o test main_init_weak.c another_def.c && ./test
1234
Sekomer
  • 688
  • 1
  • 6
  • 24
Matthew Slattery
  • 45,290
  • 8
  • 103
  • 119
  • Great explanation with examples. Could you also include how weak interacts with functions? – Tomek Sep 11 '10 at 19:34
  • @Maciej - good point, forgot this was tagged with both when writing my answer originally! No, it's not the same for C++, so I've updated my answer accordingly. – Matthew Slattery Sep 11 '10 at 20:02
  • @Tomek - weak functions are similar: if the linker sees both a weak definition of a function and a strong (normal) one, it will resolve any undefined references to the strong one, and ignore the weak one. If it only sees a weak definition, it will use that as if it were a normal one. – Matthew Slattery Sep 11 '10 at 20:11
  • 5
    This explanation can be taken as the description of the specific compiler behavior. It is **incorrect** as the description of C language behavior. C language *prohibits* multiple definitions of entities with external linkage (just like C++ does). The behavior observed is nothing more than a compiler extension. – AnT stands with Russia Sep 11 '10 at 20:23
  • 1
    Additionally, it should be noted that in C, just like in C++, uninitialized global variables are implicitly initialized with 0. For some reason, the wording in the answer above seems to present it as a difference between C and C++. – AnT stands with Russia Sep 11 '10 at 20:27
  • @AndreyT right, but the question is specifically asking about gcc and weak symbols, it's a linker/compiler/implementation question, not a language question and this is an appropriate answer imo. – Logan Capaldo Sep 11 '10 at 20:53
  • @Logan Capaldo: I agree, the answer is appropriate. Maybe I just misunderstood the "In C" and "In C++" captions. Were they meant to refer to "GCC in C mode" and "GCC in C++ mode"? – AnT stands with Russia Sep 11 '10 at 20:58
  • @AndreyT: yes (as the question is very specifically asking about GCC). I guess it read better before I edited earlier to note the C vs. C++ difference. I've edited again to clarify that I'm talking about GCC. (Regarding your other point: I did also originally say that uninitialised common symbols end up in the BSS. I've moved that to the top as well.) – Matthew Slattery Sep 11 '10 at 21:30
10

The question is based on an incorrect premise. Uninitialized global variables are not weak symbols.

Apparently the question is referring to the ability to define the same uninitialized object with external linkage in multiple translation units. Formally, it is not allowed - it is an error in both C and C++. However, at least in C is recognized by C99 standard as "common extension" of the language, implemented in many real-life compilers

J.5 Common extensions

J.5.11 Multiple external definitions

1 There may be more than one external definition for the identifier of an object, with or without the explicit use of the keyword extern; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).

Note, that contrary to the popular belief, C language explicitly prohibits introducing multiple definitions of entities with external linkage in the program, just like C++ does.

6.9 External definitions

5 An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.

However, the extension allowing this has been pretty popular with many C compilers, of which GCC just happens to be one.

Community
  • 1
  • 1
AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • 1
    The supporting linker feature is required in order to implement FORTRAN common blocks. You will notice that GCC does include a FORTRAN compiler in the collection. Since the supporting libraries are implemented in C, it is no surprise that there is a small amount of harmless leakage back from the linker. – RBerteig Sep 12 '10 at 00:24
  • Can you give an example of external definition and external declaration? – Thomson Sep 12 '10 at 02:39
  • It's not just FORTRAN, C++ inline functions and implicitly instantiated templates are also weak. – MSalters Jun 18 '13 at 12:55
3

Is this what you meant?

weak.c

#include <stdio.h>

int weak; /* global, weak, zero */

int main(void) {
  printf("weak value is %d.\n", weak);
  return 0;
}

strong.c

int weak = 42; /* global, strong, 42 */

Sample run

$ gcc weak.c
$ ./a.out
weak value is 0.
$ gcc weak.c strong.c
$ ./a.out
weak value is 42.

The int weak; in weak.c is a declaration, not a definition. Or you may say it's a tentative definition. The real definition is in strong.c when that object file is linked in the final program or in weak.c otherwise. This is a common extension, one that gcc uses (thanks Andrey).

pmg
  • 106,608
  • 13
  • 126
  • 198
  • 2
    Not true. `int weak` in `weak.c` is a definition. A tentative one, but definition nevertheless. The program above *does* violate the rules of the language, yet is supported as a non-standard extension. – AnT stands with Russia Sep 11 '10 at 20:15
3

Any multiple definition of a global symbol is undefined behavior, so gcc (or rather the GNU binutils linker) is free to do whatever it wants. In practice, it follows the traditional behavior to avoid breaking code that relies on this behavior.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711