11

Speaking of string literals, the C99 standard says (6.4.5.6):

It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

I couldn't find either a similar warning or an explicit guarantee for const variables. Can the expression &x == &y in the context const int x=12; const int y=12; evaluate to 1? What about a const variable and a string literal (i.e. is &x == "\014\000\000" guaranteed to be 0 even on a 32-bit little-endian platform)?

For what it's worth, the section "String literals" in this blog post gives the context of the question.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • Note that gcc will only emit `x` and `y` if its addresses can be taken, and in that case, it emits them at different addresses. – ninjalj Jun 04 '11 at 13:10
  • 2
    See also: [Deep Analysis of Const Qualifier in C](http://stackoverflow.com/questions/4275504/deep-analysis-of-const-qualifier-in-c) – Cody Gray - on strike Jun 04 '11 at 14:12

4 Answers4

6

As far as I'm aware, the Standard does not allow two named objects of any type to have the same address (except for union members). From 6.5.9/6:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object...

String literals are not const variables so your secondary question is moot, and I don't see what 32-bitness and endianness have to do with it.

  • All right, make the second question about `const char t[4]={'a', 'b', 'c'};` and `"abc"` if you prefer. – Pascal Cuoq Jun 04 '11 at 12:50
  • 2
    What they have to do with it is that the object representation of `x` (an integer 12) just so happens on a 32bit little-endian platform to contain the same bytes as the object representation of the string literal `"\014\000\000"`. So I believe that part of the question is, can the string literal be folded into `x`, or can string literals only be folded into other string literals. – Steve Jessop Jun 04 '11 at 12:53
  • @Steve Oh, I see. Thanks. @Pascal Well, as far as I know the Standard only says that values may be indistinct for string literals. But it's kind of hard to provide evidence for a negative. –  Jun 04 '11 at 12:57
5

In the standard, equality is discussed in §6.5.9 “Equality operators”, & is discussed in §6.5.3.2 “Address and indirection operators”, and const is discussed in §6.7.3 “Type qualifiers”. The relevant passage about pointer equality is §6.5.9.6:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, [or pointers past the end of an array]

The only definition of & is that “The unary & operator yields the address of its operand. […] The result is a pointer to the object or function designated by its operand.” (§6.5.3.2.3). There is unfortunately no formal definition of the word “address”; but distinct objects (for the equality defined by ==) have distinct addresses, because the addresses are pointers that are distinct by the definition of equality above.

As for the meaning of const, §6.7.3 doesn't indicate that const has any bearing on what makes an object (which is “a region of data storage in the execution environment, the contents of which can represent values” by §3.14). A footnote further indicates that “the implementation need not allocate storage for such an object if its address is never used”. Although this is non-normative, it is a strong indication that if the address is used then storage must be allocated for each object.

Note that if the objects are const volatile, then it is fairly clear (as clear as volatile can ever be) that they can't have the same address, because const volatile objects are mutable by the implementation. (§6.7.3.10 has an example of use of const volatile.)

Even in the non-volatile case, const only indicates that this part of the program is not allowed to modify the object, not that the object is read-only in general. To merge the storage of a const object with something else, the audacious implementer would have to guarantee that nothing can modify the object. This is fairly difficult for an object with external linkage in an implementation with separate compilation (but of course we're getting away from the standard and into the won't-happen-in-practice territory).

If this is about writing a C program, then you can increase your chances by giving the objects different values:

const int x = __LINE__;
const int y = __LINE__;

If this is about a theoretical model of C, I'd go for making the objects distinct. You'll have to justify this choice by summarizing the answers here in a paragraph in (the extended version of) your paper.

On the other hand, if this is about writing an optimizing compiler, I doubt it would hurt many real-world programs to merge constants. I'd go for merging in an embedded compiler, where users are used to playing it safe with edge cases and where the memory saved could be non-negligible. I'd go against merging in a hosted platform where any gain would be negligible.

(References from N1256 a.k.a. C99+TC3. I don't think the version makes a difference.)

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
  • Surely if you're going to implement a non-conforming optimization, even on an embedded target, you should make it turn-off-and-onable. In which case assuming there's demand for it anywhere, you may as well offer it for any/all targets, and leave it to the user to decide whether it will break their program and whether they need to save the space. – Steve Jessop Jun 04 '11 at 14:02
  • I like your arguments, but regarding your paragraph "Even…" on the non-volatile const, `GCC` compiles `const int x = 12; int y; main() { y = x; f(); y += x; printf("%d\n", y); }`into `movl $12, …` and `addl $12, …`, assuming that `f()` does not change the const variable `x` (it assumes that `f()` may change `y`, though. – Pascal Cuoq Jun 04 '11 at 14:36
  • @Pascal: For this test to be reliable, you need to convince gcc that the address of `x` is taken, i.e. have `&x` in some place where it won't be optimized away. You also need to convince gcc that it doesn't know whether `f` might see `x`; having `x` be `extern` and `f` defined in another file should do it. – Gilles 'SO- stop being evil' Jun 04 '11 at 14:41
  • Ah, yes, taking the address of `x` in a way that could leak to `f()` is better (and `GCC` continues to use `$12`). Note that `f()`, placed in another file, could already access `x` in the initial version, because `x` was not `static` anyway. – Pascal Cuoq Jun 04 '11 at 14:47
  • 6
    GCC has such a flag: `-fmerge-all-constants`. Its description has the following interesting sentence: _Languages like C or C++ require each non-automatic variable to have distinct location, so using this option will result in non-conforming behavior._ – ninjalj Jun 04 '11 at 15:41
3

In

const int x=12;
const int y=12;

x and y are different variables (both const-qualified) and therefore have different addresses.

The same for the other example.

Note that const is a qualifier for an object. Regarding memory layout, it makes no difference if it's there or not.

pmg
  • 106,608
  • 13
  • 126
  • 198
2

6.4.5/6 says of the arrays corresponding to string literals:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values.

So that's a specific rule allowing string literals to be folded. I don't know of anything in the standard that says the same thing for other objects.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699