-1

I'm learning C in my free time. I was playing with pointers when program behavior puzzled me . Could someone, please, explain (or reference some readings) why i have different results in following cases?

Ubuntu 19.04
cc (Ubuntu 8.3.0-6ubuntu1) 8.3.0
Intel i7-8565U

#include <stdio.h>

int main() {
//    int a = 6;
    int i1 = 5;
    printf("&i1 = %p\n", &i1);

    size_t i1_address = (size_t) &i1;
    int *p = (int *) (i1_address + 4);
    printf("p = %p\n", p);
    *p = 12;

    int i2;
//    printf("&i2 = %p\n", &i2);
    printf("i2 = %d\n", i2);

    return 0;
}

Code from above outputs exactly what i expect:

&i1 = 0x7ffd86048110
p = 0x7ffd86048114
i2 = 12

If I uncomment both commented rows, output is almost the same (i2 = 12).
But if I uncomment only first commented row (int a = 6;) i2 equals to some random number:

&i1 = 0x7ffd539630fc
p = 0x7ffd53963100
i2 = 21901

Any explanation how exactly int a = 6; impacts the program so I get unexpected result and how printf("&i2 = %p\n", &i2); fixes it?

eam
  • 9
  • 4
  • 2
    `*p = 12;` is _undefined behavior_. Looks like you want an explanation how things work when you break the rules of C. When you break the rules, anything goes. The fix is not to break the rules. eam, Why do you think `*p = 12;` is OK? – chux - Reinstate Monica Aug 12 '19 at 23:27
  • `printf("i2 = %d\n", i2)` when `i2`'s value is indeterminate, as in the example code, also produces undefined behavior. – John Bollinger Aug 12 '19 at 23:32
  • @chux I don't really think it's OK. Just decided to check what'll happen, discovered different behavior, got curious and didn't found an answer, so decided to ask here. I understand that this code is kind of nasty and not correct, but I'm interesting what's happening under the hood here. – eam Aug 12 '19 at 23:38
  • You may benefit from the **Pointer Basics** contained here [Can I dereference the address of an integer pointer?](https://stackoverflow.com/questions/57451436/can-i-dereference-the-address-of-an-integer-pointer/57451658#57451658) – David C. Rankin Aug 13 '19 at 00:16
  • Adding the variable `a` somehow changes the layout of the local variables. The compiler is allowed to arrange local variables in memory however it wants; you can't say that `*(int*)((size_t)&i1 + 4)` is always `i2`. In particular, you have a chance of overwriting unused memory, overwriting something important, or crashing. – user253751 Aug 13 '19 at 06:05

2 Answers2

4

In this answer, I will discuss each significant line.

int a = 6;

The presence of this line is not relevant to the meaning of the C code (that is, to what the C standard specifies about its behavior). If it affects the running program, it is likely just because it just happens to affect how the compiler arranges local variables in memory in some uncontrolled way (that is, in some way that was not particularly deliberately designed into the compiler). The fact that it affects the program is a distraction and is not very meaningful.

int i1 = 5;

Fine, that is a normal line.

printf("&i1 = %p\n", &i1);

This is technically wrong; it should be printf("&i1 = %p\n", (void *) &i1);, because %p is specified for use with void * but not with other pointer types. However, it will not affect most C implementations.

size_t i1_address = (size_t) &i1;

size_t is not guaranteed to hold all the information about a pointer. It is better to #include <stdint.h> and use uintptr_t instead of size_t.

int *p = (int *) (i1_address + 4);

This assumes (we infer from context) the size of int is 4 and that the result of converting &i1 to size_t, adding 4, and converting to int * yields a pointer to just beyond i1. I presume the “cc” referred to in the question is some version of GCC, in which case this is sort of okay because GCC supports doing this sort of address arithmetic (I believe from memory, without looking up specific documentation).

printf("p = %p\n", p);

As above, this should be printf("p = %p\n", (void *) p);.

*p = 12;

This is bad. p is not pointing to a known object. In the computing model that the C standard uses, it is not pointing to an object at all, so the behavior of the expression *p is not defined by the standard, and neither is assigning anything to it. Unlike some behaviors not defined by the C standard, such as some address arithmetic, GCC does not make any promises about this sort of abuse.

int i2;

Fine.

printf("&i2 = %p\n", &i2);

This should also be printf("&i2 = %p\n", (void *) &i2);

printf("i2 = %d\n", i2);

In the standard’s model, i2 is indeterminate because it has not been initialized (including by assignment). “Indeterminate” means not just that it does not have a particular value but that might not have any value at all in the sense of having a value that persists from use to use. While the value of i2 is indeterminate, the C standard permits each use of it to act as if it had a different value or trap representation. (In the absence of the prior statement, which contains &i2, the use of i2 in this statement would have undefined behavior, due to a particular rule in the C standard that says using an uninitialized object with local storage duration that has not had its address taken has behavior not defined by the C standard. With the prior statement, there is merely an indeterminate value, not undefined behavior.)

To my knowledge, GCC on Ubuntu does not have trap representations for int objects, so printf("i2 = %d\n", i2); by itself would print some value for i2. It is not undefined behavior, just not completely specified behavior. (However, since this statement is preceded by statements with undefined behavior, we do not know that program execution will ever reach this statement, and, if it does, the C standard does not tell us what will happen, because the prior undefined behavior makes the subsequent behavior also undefined.)

It is possible that *p = 12; puts 12 in the space that is then used for i2, and so printf("i2 = %d\n", i2); might show 12 for i2. Certainly the C standard does not require this in any way, but GCC might do that, and whether it does that or does not do that could be affected by whether the statements int a = 6; or printf("i2 = %p\n", &i2) are or are not present. Again, however, none of those variations in behavior from the presence or absence of the statements are very meaningful. A better way to learn how the compiler behaves is to examine the assembly language it generates with various variations in the source code and compiler switches. (With GCC, use -S to generate assembly language.)

(One could learn more about the compiler’s behavior by reading the source code, but that is not better for many people because it requires a great deal more work to accumulate the knowledge required before the source code can be sensibility interpreted.)

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
1

Does creation of a pointer from not initialized variable have side effect?

The language specifications do not define any side effect per se from taking the address of an uninitialized variable. A variable's address is well defined regardless of its initialization status, and the resulting pointer can safely be used to assign a value to the variable, making its value determinate and therefore safe to read either directly or indirectly.

Until an uninitialized local variable is assigned a value, however, reading that variable's value produces undefined behavior (in the event that its address has never been taken) or yields a value that is not specified and may be a trap representation (otherwise).

In the latter case, there is no guarantee that the value read is consistent. In the former case, the undefined behavior may manifest differently for any reason or no apparent reason at all. Behavior changes attending seemingly unrelated code changes are one of the classic hallmarks of UB. Either way, then, the language provides no explanation for why your program prints the values it does.

So,

Any explanation how exactly int a = 6; impacts the program so I get unexpected result and how printf("&i2 = %p\n", &i2); fixes it?

No, no such explanation exists at the level of the C language. Do not exercise UB (anywhere) or rely on indeterminate values if you want behavior that can be predicted from the language definition.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 2
    Reading the value of an uninitialized local variable whose address has been taken does not produce undefined behavior. The value is indeterminate, which can cause other problems such as traps in some implementations. But the rule in C 2018 6.3.2.1 2 about undefined behavior for uninitialized objects of automatic storage duration only applies if the address has not been taken, which is not the case in this question. – Eric Postpischil Aug 13 '19 at 00:34
  • Fair enough, @EricPostpischil, I have modified the answer recognize that distinction, which is indeed relevant (to one variation of the OP's program). In the end, though, it makes no significant difference for the conclusion. – John Bollinger Aug 13 '19 at 00:54
  • @EricPostpischil So do I understand it right, that case when both lines is uncommented is correct C code with defined behavior since address has been taken (I don't say good code), and other two are undefined behavior and it's pure luck that one of them did what I've expected? – eam Aug 13 '19 at 01:15
  • @eam: No. I will enter an answer. – Eric Postpischil Aug 13 '19 at 01:22
  • 1
    The committee has stated that passing an indeterminate value to a stdlib func has UB. – Antti Haapala -- Слава Україні Aug 13 '19 at 02:24