4

I am new to this particular forum, so if there are any egregious formatting choices, please let me know, and I will promptly update.

In the book C Programming: A Modern Approach (authored by K. N. King), the following passage is written:

If a pointer variable p hasn't been initialized, attempting to use the value of p in any way causes undefined behavior. In the following example, the call of printf may print garbage, cause the program to crash, or have some other effect:

int *p;
printf("%d", *p);

As far as I understand pointers and how the compiler treats them, the declaration int *p effectively says, "Hey, if you dereference p in the future, I will look at a block of four consecutive bytes in memory, whose starting address is the value contained in p, and interpret those 4 bytes as a signed integer."

As to whether or not that is correct...if it is correct, then I am a little confused about why the aforementioned block of code:

  1. is classified as undefined behavior
  2. can cause programs to crash
  3. can have some other effect

Commenting on the above-numbered cases:

My understanding of undefined behavior is that, at run time, anything can happen. With that being said, in the above code it appears to me that only a very defined subset of things can happen. I understand that p (due to its lack of initialization) is storing a random address that could point anywhere in memory. However, when printf is passed the dereferenced value *p, won't the compiler just look at the 4 consecutive bytes of memory (which start at whatever random address) and interpret those 4 bytes as a signed integer?

Therefore, printf should only do one thing: print a number that ranges anywhere from -2,147,483,648 to 2,147,483,647. Clearly that is a lot of different possible outputs, but does that really qualify as "undefined behavior". Further, how could such an "undefined behavior" lead to "program crash" or "have some other effect".

Any clarification would be greatly appreciated! Thanks!

S.C.
  • 231
  • 1
  • 8
  • 1
    " I will look at a block of four consecutive bytes in memory" --> No. UB can occur just by accessing the unitized pointer - before `printf()` is ever called. `p` is not defined as sorting a random address. – chux - Reinstate Monica Jun 20 '20 at 03:06
  • 1
    The "anything can happen" part is usually because compilers are allowed to assume that things that cause UB _cannot_ happen. So if you had UB inside an `if` condition, a compiler _may_ elide that entire branch. – bnaecker Jun 20 '20 at 03:07
  • This may also be very useful reading: https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior. – bnaecker Jun 20 '20 at 03:08
  • 1
    Although "anything can happen" is the only correct description of the consequences, you might want to think about what happens if the uninitialised contents of `p` are a value which is not a valid address of an integer, either because the memory doesn't exist, or because the machine architecture only allows integers to have certain addresses (divisible by 4, for example), or because that part of the address space exists but your process isn't permitted to read it, etc. etc. – rici Jun 20 '20 at 03:20
  • Or the address in `p` happens to be part of the memory-mapped I/O controller, and a read from that address is treated as a command to the hard drive to reformat itself. (Some processors, like the 6502, do all I/O through memory-mapped addresses. Simply reading from an address is considered a hardware command. Even the x86 uses latched addressing for certain hardware, like the EGA video card. Reading from the latch will mess up your video.) – Raymond Chen Jun 20 '20 at 03:42

2 Answers2

4

The value of an uninitialized value is indeterminate. It could hold any value (including 0), and it's even possible that a different value could be read each time you attempt to read it. It's also possible that the value could be a trap representation, meaning that attempting to read it will trigger a processor exception that can crash the program.

Assuming you got lucky and were able to read a value for p, due to the virtual memory model most systems use that value may not correspond to an address that is mapped to the process's memory space. So if you attempt to read from that address by dereferencing the pointer it triggers a segmentation fault that can crash the program.

Notice that in both of these scenarios the crash occurs before printf is even called.

Also, compilers are allowed to assume your program does not have undefined behavior and will perform optimizations based on that assumption. That can make your program behave in ways you might not expect.

As for why doing these things is undefined behavior, it is because the C standard says so. In particular, appendix J2 gives as an example of undefined behavior:

The value of an object with automatic storage duration is used while it is indeterminate. (6.2.4, 6.7.9, 6.8)

dbush
  • 205,898
  • 23
  • 218
  • 273
  • Just for clarification (sort of a novice to all of this stuff), when you say "in both of these scenarios, the crash occurs before `printf` is even called", that is because the compiler will 'calculate' the value that `*p` corresponds to PRIOR to passing that value to `printf`. Is that correct? – S.C. Jun 20 '20 at 03:25
  • @S.Cramer Correct. The second parameter to `printf` is the expression `*p`, so that expression must be fully evaluated before the function is called. – dbush Jun 20 '20 at 03:26
2

Undefined Behavior is defined as "we are not specifying what must happen, it's up to the implementers."

In a practical sense, *p is likely to contain whatever that memory area held last, maybe zeros, maybe something more random, maybe a chunk of data from a previous use. On occasion, a compiler will implicitly zero memory for safeties sake, sacrificing a bit of time to offer that feature.

Notably, if p were defined as a char*, and you printf'ed it, it'd try to print contents until it found a 0x00. If that takes you to a memory boundary, you could get a segmentation fault.

Hack Saw
  • 2,741
  • 1
  • 18
  • 33
  • That is not how undefined behavior is *defined*. Undefined behavior has a specific meaning distinct from *unspecified* behavior and from *implementation-defined* behavior. – jamesdlin Jun 20 '20 at 03:17
  • Pointers, please? – Hack Saw Jun 20 '20 at 03:19
  • @HackSaw https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior – RobertS supports Monica Cellio Jun 20 '20 at 09:12
  • Thanks! I think I might just delete this, @dbush has written a more detailed answer. On the other hand, these links are great supplementary info. What's the best practice here? – Hack Saw Jun 20 '20 at 22:17
  • @jamesdlin: Where does that myth come from!? According to the authors of the Standard, "Undefined behavior... also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior." – supercat Jun 21 '20 at 03:55
  • @jamesdlin: The term "Implementation Defined Behavior" is only usable in situations where all implementations would be *required* to document the behavior. Undefined behavior is used in cases where at least some implementations might not practically be able to document a behavior, even if most implementations can and should. – supercat Jun 21 '20 at 03:57
  • @HackSaw: The fact that UB was intended as an invitation for implementations to define behaviors themselves is insufficiently recognized, even though the Rationale is very clear about the Committee's intention. – supercat Jun 21 '20 at 04:10
  • @supercat I didn't mean that implementations couldn't define UB, but I took issue with "Undefined behavior is defined as..." since that definition could also describe unspecified behavior or implementation-defined behavior. – jamesdlin Jun 21 '20 at 04:20
  • TBH, I think they all are essentially the same thing. The difference seems to be "We haven't even thought about it" (undef) vs. "We thought about it, but came to no conclusion" (unspec) vs. "We think the implementors might make a choice". To me, the real choices are "We sort of specified it" like int being platform appropriate, versus "We're not going to deal with that for whatever reason". These are both versions of "Up to the implementors", but with one of them strongly suggesting that there be a solution. – Hack Saw Jun 22 '20 at 02:56