9

I came across the following construct in a C project that I port to C++;

enum TestEnum 
{
    A=303,
    B=808
} _TestEnum;

int foo()
{
  _TestEnum = B;
}

When compiling with GCC and taking a look at the generated code I get:

nils@doofnase ~ $ gcc -std=c90 -O2 -c ./test.c -o test.o
nils@doofnase ~ $ size test.o
   text    data     bss     dec     hex filename
     59       0       0      59      3b test.o

So zero bytes of data or BSS segments used.

On the other hand if I compile in C++ I get:

nils@doofnase ~ $ g++ -std=c++11 -O2 -c ./test.c -o test.o
nils@doofnase ~ $ size test.o
   text    data     bss     dec     hex filename
     59       0       4      63      3f test.o

I see four byte storage allocated in BSS as I would expect.

Also, in the C project the enum definition is actually located in a header-file which gets included in multiple c files. The project compiles and links just fine. When compiled and linked as C++ the compiler complains that _TestEnum is defined in multiple objects (right so!).

What is going on here? Am I looking at some archaic C language special case?

Edit: For completes sake, this is the gcc version:

nils@doofnase ~ $ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
Nils Pipenbrinck
  • 83,631
  • 31
  • 151
  • 221
  • 1
    With ``-O2`` I would hope that the compiler inlines the code of ``foo()``entirely both in C and in C++ ;) Also you should get a warning in both C and C++ that ``foo()`` has return type ``int`` and does not return a value. Maybe the code snippet is too small to get an idea what should happen... given that it might well depend on the rest of the code. – BitTickler Jan 06 '18 at 10:21
  • Yea please fix the code, and add `-Wall` – Antti Haapala -- Слава Україні Jan 06 '18 at 12:00
  • 1
    Note that identifiers matching the regex `^_[A-Z]` like your `_TestEnum` are reserved in both C and C++, and introducing them leads to undefined behavior (see [this post](https://stackoverflow.com/a/228797/673852) for details). – Ruslan Jan 06 '18 at 13:52

1 Answers1

11

By default, the GCC compiler enables a C extension (even when -pedantic flag is in effect), which allows multiple external definitions of objects across translation units.

Referring to C11 (N1570) J.5.11 Multiple external definitions (informative section):

There may be more than one external definition for the identifier of an object, with or without the explicit use of the keyword extern; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).

Note that an application, that relies on this behavior is not strictly conformant with the ISO C language. More specifically, C11 6.9/p5 External definitions states (emphasis mine):

An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.161)

Technically, a violation of that rule invokes an undefined behavior, which means that an implementation may or may not issue an diagnostic message.

You may inspect, that this extension has been enabled by nm command:

nm test.o 
0000000000000000 T foo
0000000000000004 C _TestEnum

According to man nm:

"C" The symbol is common. Common symbols are uninitialized data. When linking, multiple common symbols may appear with the same name. If the symbol is defined anywhere, the common symbols are treated as undefined references.

In order to disable this extension, you may use -fno-common flag. From GCC documentation:

Unix C compilers have traditionally allocated storage for uninitialized global variables in a common block. This allows the linker to resolve all tentative definitions of the same variable in different compilation units to the same object, or to a non-tentative definition. This is the behavior specified by -fcommon, and is the default for GCC on most targets.

Grzegorz Szpetkowski
  • 36,988
  • 6
  • 90
  • 137
  • That's it! Thanks a lot. – Nils Pipenbrinck Jan 06 '18 at 10:26
  • I take issue with saying "this behavior is not allowed", because an implementation is clearly allowed to support multiple external definitions. The way you word it sounds like GCC is in violation of the standard. Otherwise, fine answer. – trent Jan 06 '18 at 15:51
  • @trentcl: Good point. I changed the wording to "not strictly conformant", since it formally invokes an UB. – Grzegorz Szpetkowski Jan 06 '18 at 15:58
  • This behavior is strictly conformant. Undefined behavior means, among other things, that an implementation does not have to issue a diagnostic, and that whatever an implementation does is in full compliance with the standard. That "whatever" can range from nasal demons to giving the UB a defined behavior. – David Hammen Jan 06 '18 at 16:11
  • @DavidHammen: I agree. I think that the point is that an application, that relies on UB is not strictly conformant, not the implementation. In that case, I am basing on C11 4/p5 Conformance. – Grzegorz Szpetkowski Jan 06 '18 at 16:26