1

In one of my C files, I'm declaring an array foo. Then I'm assigning the address of that variable to an integer type, and I want to bitmask it with 3 to set the lowest two bits. However, the bitmask fails during compiling but adding +3 seems to work. Why?

uint64_t foo[1];
uint64_t bar = (uint64_t)foo | 3;

This fails with:

main.c:6:16: error: initializer element is not constant
 uint64_t bar = (uint64_t)foo | 3;

But this works:

uint64_t foo[1];
uint64_t bar = (uint64_t)foo + 3;

As I understand it, the location of foo is not known at compile time because it's global (will be in the .data or .bss section). However, an entry is put into the relocation section so that the linker can patch the address in while linking.

How is it handling the the bitwise-or and the addition? Why does one work while the other doesn't?

Matviy Kotoniy
  • 362
  • 1
  • 13
  • I suspect [/operator precedence)](https://en.cppreference.com/w/c/language/operator_precedence) will come up, even though I do not see how it affects things. – chux - Reinstate Monica Oct 22 '19 at 01:53
  • 4
    Casts from addresses to integer types are not part of C’s basic constant expressions, but implementations are allowed to extend C by accepting other forms. It accepts `(uint64_t)foo + 3` because the object module format supports this as a relocatable expression (a symbol plus an offset) that the linker will resolve. However, `(uint64_t)foo | 3` cannot be expressed in the object module. – Eric Postpischil Oct 22 '19 at 03:46
  • 1
    @Eric That’s the bulk of a solid answer. – dmckee --- ex-moderator kitten Oct 22 '19 at 04:11
  • Strangely enough both compile and run on my machine using `gcc -Werror -Wall -Wpedantic`. – gstukelj Oct 22 '19 at 08:48
  • Possible duplicate of [Why can't you do bitwise operations on pointer in C, and is there a way around this?](https://stackoverflow.com/questions/15868313/why-cant-you-do-bitwise-operations-on-pointer-in-c-and-is-there-a-way-around-t) – gstukelj Oct 22 '19 at 08:49
  • *Alternative 1:* Because `foo` will be aligned for `uint64_t` its 3 lower bits are already 0 most probably. You can get away with the addition. *Alternative 2:* Since you seem to target an embedded system chances are that you have a linker script already. You can do some operations there. – the busybee Oct 22 '19 at 12:08
  • @gst that thread is not really relevant; the question is whether the result is a *constant expression* in C – M.M Oct 22 '19 at 13:12

3 Answers3

2

Initial values for static objects must be constant expressions or string literals. (C 2018 6.7.9 3: “All the expressions in an initializer for an object that has static or thread storage duration shall be constant expressions or string literals.”)

6.6 7 specifies forms of constant expressions for initializers:

More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:

— an arithmetic constant expression,

— a null pointer constant,

— an address constant, or

— an address constant for a complete object type plus or minus an integer constant expression.

Consider uint64_t bar = (uint64_t)foo + 3;. foo is nominally the static array declared earlier, which is automatically converted to a pointer to its first element. This qualifies as an address constant (6.6 9: “An address constant is … a pointer to an lvalue designating an object of static storage duration,… However, it is cast to uint64_t, which no longer qualifies as an address constant, an address constant plus or minus a constant expression, or a null pointer constant.

Is it an arithmetic constant expression? 6.6 8 excludes it:

… Cast operators in an arithmetic constant expression shall only convert arithmetic types to arithmetic types,…

Thus, (uint64_t)foo + 3 does not qualify as any form of constant expression required by the C standard. However, 6.6 10 says:

An implementation may accept other forms of constant expressions.

So a C implementation may accept (uint64_t) foo + 3 or (uint64_t) foo | 3 as a constant expression. Our question is then why does your C implementation accept the former but not the latter.

A common feature of linkers and object module formats is that the object module can record placeholders for certain expressions, and the linkers can evaluate these expressions and replace the placeholders with calculated values. A primary purpose of this feature is to allow for code in a program to refer to places in data or other code whose locations are not completely known during compilation but that will be decided (at least relative to some base reference point) during linking.

Places in data or code are measured relative to symbols (names) defined in the object modules (or relative to the starts of sections or segments). Thus, a place may be described, in effect, as “34 bytes after the start of routine bar” or “8 bytes after the start of object baz”. So the object module has support for placeholders that are composed of a displacement and a symbol name. After the linker assigns addresses to symbols, it reviews each placeholder, adds the displacement to the assigned address, and replaces the placeholder with the calculated result.

It appears your compiler, in spite of the uint64_t cast, is able to recognize that (uint64_t) foo is still the address of foo, and therefore (uint64_t) foo + 3 may be implemented by the regular use of one of these placeholders.

In contrast, the bitwise OR operator is not supported for use in these placeholders, and therefore the compiler is unable to implement (uint64_t) foo | 3. It cannot evaluate the expression itself (because it does not know the final address for foo), and it cannot write a placeholder for the expression. So it does not accept this as a constant expression.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
1

When you say

sometype *p = f(x);

where p is a global variable (or one with static duration) and where f(x) is not an actual function call but rather, some sequence of compile-time operations involving the address of another symbol x which won't be known until link time, the compiler obviously can't compute the initial value immediately. It actually emits an assembly language directive which causes the assembler to construct a relocation record which causes the linker to evaluate f(x) once the final location of the symbol x is known.

So f(x) (whatever sequence of operations it actually is) has to be, in effect, a function that the linker knows how to evaluate (and that there's a relocation record for, and if necessary an assembly language directive for). And while conventional linkers are good at performing addition and subtraction (because they do it all the time), they don't necessarily know how to perform other kinds of arithmetic.

So in consequence of all this, there are some additional rules on what kinds of arithmetic you can do while constructing pointer constants.

I'm in a hurry this morning and don't have time to dig through the Standard, but I'm pretty sure there's a sentence in there somewhere stating that among other restrictions on constant expressions, when you're initializing a pointer, you're limited to an address plus or minus an integer constant expression (since that's all the C Standard is willing to assume the linker is going to know how to do).

Your question has the additional complication that you're not actually initializing a pointer variable, but rather, an integer. In that case you get, in effect, the worst of both worlds: you're either not allowed to do it at all, or if the compiler lets you, the initializer on the right (since it involves an address/pointer), is limited to the kinds of arithmetic you can do while constructing pointer constants, as described above. You don't get to do the arbitrary arithmetic you'd be able to get away with (perhaps with confounding casts) in an integer expression at run time.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • The sentence on accepting an address plus a constant expression only applies to addresses. But `(uint64_t)foo` is no longer an address. – Eric Postpischil Oct 22 '19 at 13:19
0

According to the standard, the result of casting a pointer to an integer type is not a constant expression. So both of your examples may be rejected by a conforming compiler.

However there is the clause C11 6.6/10:

An implementation may accept other forms of constant expressions.

which unfortunately means that any particular compiler could accept none, one, or both of your examples.

M.M
  • 138,810
  • 21
  • 208
  • 365