3

Is it possible to see what is going on behind gcc and g++ compilation process?
I have the following program:

#include <stdio.h>
#include <unistd.h>

size_t sym1 = 100;
size_t *addr = &sym1;

size_t *arr = (size_t*)((size_t)&arr + (size_t)&addr);

int main (int argc, char **argv)
{
    (void) argc;
    (void) argv;

    printf("libtest: addr of main(): %p\n", &main);
    printf("libtest: addr of arr: %p\n", &arr);

    while(1);
    return 0;
}

Why is it possible to produce the binary without error with g++ while there is an error using gcc?
I'm looking for a method to trace what makes them behave differently.

# gcc test.c -o test_app
test.c:7:1: error: initializer element is not constant
# g++ test.c -o test_app

I think the reason can be in fact that gcc uses cc1 as a compiler and g++ uses cc1plus.
Is there a way to make more precise output of what actually has been done?
I've tried to use -v flag but the output is quite similar. Are there different flags passed to linker?
What is the easiest way to compare two compilation procedures and find the difference in them?

Mikhail Kalashnikov
  • 1,146
  • 1
  • 11
  • 31

2 Answers2

10

In this case, gcc produces nothing because your program is not valid C. As the compiler explains, the initializer element (expression used to initialize the global variable arr) is not constant.

C requires initialization expressions to be compile-time constants, so that the contents of local variables can be placed in the data segment of the executable. This cannot be done for arr because the addresses of variables involved are not known until link time and their sum cannot be trivially filled in by the dynamic linker, as is the case for addr1. C++ allows this, so g++ generates initialization code that evaluates the non-constant expressions and stores them in global variables. This code is executed before invocation of main().

Executables cc1 and cc1plus are internal details of the implementation of the compiler, and as such irrelevant to the observed behavior. The relevant fact is that gcc expects valid C code as its input, and g++ expects valid C++ code. The code you provided is valid C++, but not valid C, which is why g++ compiles it and gcc doesn't.

user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • Since 2011 it's not valid `C++` either, because of that `while(1);`. – BoBTFish Nov 27 '13 at 14:49
  • 1
    Ah, but it would be good to say _why_ the initializer expression is not a constant expression, and that is because the addresses involved are not known until link time. In C++, I believe the compiler outputs what amounts to a mini-constructor for this case. (Not exactly a constructor, but commonly uses the constructor mechanism to run.) – Joe Z Nov 27 '13 at 14:50
  • But why there is no error about "addr" variable? Address of "sym1" not known until link time as well, isn't it? Why gcc can compile "int addr = &sym1", but can not "int addr = &sym1 + &sym2"? (and why g++ can compile the both?) – Mikhail Kalashnikov Nov 27 '13 at 14:56
  • @JoeZ Good point, I've now amended the answer to include this information. – user4815162342 Nov 27 '13 at 14:59
  • @MikhailKalashnikov The C++ compiler automatically generates hidden code that initializes the global variables that is executed **at run-time** and as such has access to everything. See the edited answer. – user4815162342 Nov 27 '13 at 15:00
  • One last question: Why gcc do not complains about "addr" variable? Address of "sym1" not known until link time as well? `size_t *addr = &sym1;` But it starts complaining if there is more than one variable used in expression? – Mikhail Kalashnikov Nov 27 '13 at 15:04
  • 1
    @MikhailKalashnikov the answer to the last question is that the compiler has to put instructions in the object file for the linker to follow to put the right address in the right right place at link time These instructions (called *relocations*) have a very simple format and probably can't do addition. (By the way I used "instructions" in its generic sense, not related to "instructions of the CPU architecture") –  Nov 27 '13 at 15:14
  • The idea was to get rid of these "dynamic relocations", by simply using an offset from some variable. But unfortunately, it didn't work and I ended up here... – Mikhail Kalashnikov Nov 27 '13 at 15:27
  • @MikhailKalashnikov : The C compiler is required to support simple relocations, such as the address of a variable, or the address of a variable plus a simple offset. Computing the offset between two objects, on the other hand, isn't conforming C (or C++, as I recall), even if you hide it behind casts. G++ lets you do it (because it can issue the extra code required and has the constructor hooks to call it), but it's effectively a language extension. – Joe Z Nov 27 '13 at 15:38
  • @JoeZ Merely computing a byte offset between two object addresses using proper casts, such as `((uintptr_t) &foo - (uintptr_t) &bar)`, is conforming C and C++, because both the cast and the subtraction are perfectly defined. However, *doing* something with the computed offset, such as adding it to an unrelated pointer and dereferencing that, is certainly undefined. – user4815162342 Nov 27 '13 at 15:52
  • @BoBTFish [Wait, what?](http://stackoverflow.com/questions/3592557/optimizing-away-a-while1-in-c0x) – user4815162342 Nov 27 '13 at 20:07
  • @user4815162342 : That's not how I read this statement in the C99 standard: 6.5.6, paragraph 9, first sentence: "When two pointers are subtracted, _both shall point to elements of the same array object, or one past the last element of the array object_ [...]" That tells me they can't be pointers to two arbitrary, distinct objects. – Joe Z Nov 27 '13 at 21:51
  • @JoeZ Sure, if you subtract *pointers*. If you first cast them to `uintptr_t`, you can subtract the resulting integers as much as you like without any UB. – user4815162342 Nov 27 '13 at 21:57
  • @user4815162342 : I suppose you are correct. My brain sees `ptr` in the name `uintptr_t` and thinks "pointer arithmetic", when really it is not. (I think it's the assembly programmer in me...) Doing something with the value isn't strictly undefined either (ie. you can print it out all day long, etc.) because unsigned integer subtraction is always safe, but the value isn't guaranteed to be meaningful for any other purpose, which is what I gather you meant. – Joe Z Nov 27 '13 at 22:02
  • 1
    @user4815162342 : BTW, I found the other relevant bit in C99 which makes the expression from the OP illegal in 6.6, paragraph 6: "An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, `sizeof` expressions whose results are integer constants, and floating constants that are the immediate operands of casts. _Cast operators in an integer constant expression shall only convert arithmetic types to integer types,_ except as part of an operand to the `sizeof` operator." – Joe Z Nov 27 '13 at 22:08
  • @JoeZ Yup. BTW assembly programmers I know typically wouldn't care about UB if they saw its fin protruding from the water while swimming... – user4815162342 Nov 27 '13 at 22:08
  • @JoeZ That the OP's initialization is illegal C was never disputed (by me). I was responding to [your claim](http://stackoverflow.com/questions/20245454/how-to-trace-out-why-gcc-and-g-produces-different-code/20245510?noredirect=1#comment30197148_20245510) that **g++** was implementing a language extension. – user4815162342 Nov 27 '13 at 22:10
  • @user4815162342 : Understood. :-) – Joe Z Nov 27 '13 at 22:10
2

There is a slightly more interesting question lurking here. Consider the following test cases:

#include <stdint.h>

#if TEST==1
void *p=(void *)(unsigned short)&p;
#elif TEST==2
void *p=(void *)(uintptr_t)&p;
#elif TEST==3
void *p=(void *)(1*(uintptr_t)&p);
#elif TEST==4
void *p=(void *)(2*(uintptr_t)&p);
#endif

gcc (even with the very conservative flags -ansi -pedantic-errors) rejects test 1 but accepts test 2, and accepts test 3 but rejects test 4.

From this I conclude that some operations that are easily optimized away (like casting to an object of the same size, or multiplying by 1) get eliminated before the check for whether the initializer is a constant expression.

So gcc might be accepting a few things that it should reject according to the C standard. But when you make them slightly more complicated (like adding the result of a cast to the result of another cast - what useful value can possibly result from adding two addresses anyway?) it notices the problem and rejects the expression.