41

Consider the C program composed of two files,

f1.c:

int x;

f2.c:

int x=2;

My reading of paragraph 6.9.2 of the C99 standard is that this program should be rejected. In my interpretation of 6.9.2, variable x is tentatively defined in f1.c, but this tentative definition becomes an actual definition at the end of the translation unit, and (in my opinion), should therefore behave as if f1.c contained the definition int x=0;.

With all compilers (and, importantly, linkers) I was able to try, this is not what happens. All compilation platforms I tried do link the above two files, and the value of x is 2 in both files.

I doubt this happens by accident, or just as an "easy" feature to provide in addition to what the standard requires. If you think about it, it means there is special support in the linker for those global variables that do not have an initializer, as opposed to those explicitly initialized to zero. Someone told me that the linker feature may be necessary to compile Fortran anyway. That would be a reasonable explanation.

Any thoughts about this? Other interpretations of the standard? Names of platforms on which files f1.c and f2.c refuse to be linked together?

Note: this is important because the question occurs in the context of static analysis. If the two files may refuse to be linked on some platform, the analyzer should complain, but if every compilation platform accepts it then there is no reason to warn about it.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • 4
    Thanks for sharing. never too old to learn – Adriaan Sep 29 '09 at 10:17
  • 4
    The compiler need to reject (i.e warn or error) things only when you violate things in a constraint paragraph. The constraint that you may not have two external definitions for your things is a "shall" *outside* a constraint paragraph. Violating any *shall* outside a constraint automatically results in undefined behavior in C - that's what grants the compiler to treat it like it wants. – Johannes Schaub - litb Sep 29 '09 at 11:35
  • @litb That's an interesting point. The static analyzer I mentioned tries, when possible, not to flag /established/ programming practices even when they are not defined by the standard. Here, I think we will decide not to warn, since on a platform on which these multiple definitions are not supported, *probably* they would result in a failure at link-time, not run-time. PS:I know what "undefined" means but each additional analysis option make the analyzer a little less usable, and that must be weighted against the gains. Hence the "Names of platforms on which..." part of the question – Pascal Cuoq Sep 29 '09 at 11:58
  • 2
    Recent gcc versions use `-fno-common` by default. Then you will get a linker error even if you just have `int x;` without initialization in `f2.c`. Merging tentative definitions across compilation units is bad, IMHO. It will lead to bugs. The extern keyword exists now to do things properly. – Sven Jan 23 '21 at 03:30

3 Answers3

35

See also What are extern variables in C. This is mentioned in the C standard in informative Annex J as a common extension:

J.5.11 Multiple external definitions

There may be more than one external definition for the identifier of an object, with or without the explicit use of the keyword extern; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).

Warning

As @litb points out here, and as stated in my answer to the cross-referenced question, using multiple definitions for a global variable leads to undefined behaviour, which is the standard's way of saying "anything could happen". One of the things that can happen is that the program behaves as you expect; and J.5.11 says, approximately, "you might be lucky more often than you deserve". But a program that relies on multiple definitions of an extern variable - with or without the explicit extern keyword - is not a strictly conforming program and not guaranteed to work everywhere. Equivalently: it contains a bug which may or may not show itself.

See also How do I use extern to share variables between source files?

As noted by Sven in a comment, and in my answer to "How do I use extern…", GCC changed its default rules relatively recently. In GCC 10.x (from May 2020) and later versions, the default compilation mode uses -fno-common whereas in prior versions the default mode used -fcommon. The new behaviour means that you do not get away with multiple tentative definitions, which is what the C standard requires for strict conformance.

If you use GCC and have code that (ab)uses multiple tentative definitions, you can add -fcommon to the compilation process and it will work as before. However, your code is not maximally portable, and it would be better for the long-term to revise the code so that each variable is properly defined in one source file (that is linked with all programs that need to use that variable) and properly declared in one header that the source files which use the variable can all include (and the source file defining the variable should also include the header to ensure consistency).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • While I was specifically asking about non-extern variables, there is indeed an interesting tidbit of clarification placed in this paragraph. Thanks for the reference. This is what I love about standards... – Pascal Cuoq Sep 29 '09 at 06:02
  • 4
    Since the two variables are both at file scope and are not static (they have to be like that for there to be any issue at all), they are both 'extern' -- with or without the explicit use of the keyword extern. – Jonathan Leffler Sep 29 '09 at 06:10
  • Does every compilation of a translation unit with a tentative definition create a memory location for it at first whereas the linker just keeps one, or how would that work out? Using the extern keyword on the other hand should not create any memory location, should it? – olovb Sep 29 '09 at 06:51
  • @Jonathan I meant "While I was specifically asking about variables without the 'extern' keyword". @olovb That's the way I see it. The Unix command 'nm' (applied to object files) gives this impression, too. – Pascal Cuoq Sep 29 '09 at 07:10
  • so this underlines the importance of naming conventions and/or name spaces. A global variable 'x' is great for educational purposes, but hardly unique in real life.... – Adriaan Sep 29 '09 at 10:16
  • 2
    To be really clear whether it's allowed or not: No it's undefined behavior in C. It's like doing `a[10] = 0;` even if `a` is a `int a[1];`, which was and is allowed as a common extension too (before we had flexible array members). I think it should clearly be noted that doing it is undefined behavior formally, in addition to having defined behavior on some platforms. – Johannes Schaub - litb Sep 29 '09 at 11:31
  • 1
    @Jonathan, i'm sorry if i'm a bit annoying with my UB comments :) I just thought the questioner may think by hearing "common extension" that the C Standard somehow allows programs to do this and remain strictly conforming :) +1'ed you of course – Johannes Schaub - litb Sep 29 '09 at 11:38
  • @litb: NP - it is, as you say, not formally allowed by the standard, and programs that define the same global variable name multiple times are not strictly conforming or maximally portable. And the cross-referenced question/answer also goes through that. I'll point it out in this answer explicitly too; it is an important point. – Jonathan Leffler Sep 29 '09 at 13:41
  • 1
    Note that an `extern` keyword makes it a declaration and not a definition at all, so there's no way to have "multiple definitions with an explicit `extern` keyword" – Chris Dodd Jun 01 '12 at 00:28
  • great answer Mr @JonathanLeffler – Santhosh Pai Dec 20 '13 at 11:20
  • @JohannesSchaub-litb: +1 _I thought the questioner may think by hearing "common extension" that the C standard somehow allows programs to do this and remain strictly conforming_ - [This](http://c-faq.com/decl/common.html) should disambiguate what _common extension_ means and alleviate your concern. – legends2k Mar 03 '14 at 12:20
  • @legends2k: In many ways, 'common extension' in this context is a double entendre. Annex J.5 of the 2011 standard is titled 'Common Extensions'. Within the list of common extensions, section J.5.11 'Multiple external definitions' evokes the Fortran 'common' mechanism. So it could be regarded as the 'COMMON common extension'. – Jonathan Leffler Mar 03 '14 at 14:00
  • @JonathanLeffler: Got it about the double entendre. I didn't know that the standard actually lists extensions commonly supported by the compilers of the language; am I right in taking it that a program using one such extension is still not strict C11? – legends2k Mar 04 '14 at 00:55
  • @legends2k: yes, using one of the common extensions makes a program not strictly conforming to C11, but means (roughly) you might get away with it more often than reading the rest of the standard would lead you to expect. – Jonathan Leffler Mar 04 '14 at 03:06
15

There is something called a "common extension" to the standard, where defining variables multiple times is allowed as long as the variable is initialized only once. See http://c-faq.com/decl/decldef.html

The linked page says this is pertinent to Unix platforms--I guess it's the same for c99 as c89--though maybe it has been adopted by more compilers to form some sort of a defacto standard. Interesting.

olovb
  • 2,164
  • 1
  • 17
  • 20
7

This is to clarify my answer to a comment by olovb:

output of nm for an object file compiled from "int x;". On this platform, symbols are prepended with a '_', that is, the variable x appears as _x.

00000000 T _main
         U _unknown
00000004 C _x
         U dyld_stub_binding_helper

output of nm for an object file compiled from "int x=1;"

00000000 T _main
         U _unknown
000000a0 D _x
         U dyld_stub_binding_helper

output of nm for an object file compiled from "int x=0;"

00000000 T _main
         U _unknown
000000a0 D _x
         U dyld_stub_binding_helper

output of nm for an object file compiled from "extern int x;"

00000000 T _main
         U _unknown
         U dyld_stub_binding_helper

EDIT: output of nm for an object file compiled from "extern int x;" where x is actually used in one of the functions

00000000 T _main
         U _unknown
         U _x
         U dyld_stub_binding_helper
Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • 6
    In case someone is unfamiliar with the output of nm: D is defined. U is undefined. and from man nm - "C" The symbol is common. Common symbols are uninitialized data. When linking, multiple common symbols may appear with the same name. If the symbol is defined anywhere, the common symbols are treated as undefined references. – Falaina Sep 29 '09 at 07:43