77

Consider this program:

#include <stdio.h>

int main(void)
{
    unsigned int a;
    printf("%u %u\n", a^a, a-a);
    return 0;
}

Is it undefined behaviour?

On the face of it, a is an uninitialized variable. So that points to undefined behaviour. But a^a and a-a are equal to 0 for all values of a, at least I think that is the case. Is it possible that there is some way to argue that the behaviour is well defined?

ks1322
  • 33,961
  • 14
  • 109
  • 164
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • I would expect this to be well-defined as the value of a is unknown but fixed and it should not change. The question is whether the compiler would allocate the space for `a` and subsequently read from the garbage sitting there. If not, then the behaviour is undefined. – martin Aug 01 '14 at 06:35
  • Hmm so long as the variable isn't marked `volatile` then I would accept that as being defined behaviour. `a ^= a`, is exactly equivalent to `a = 0` – fileoffset Aug 01 '14 at 06:38
  • 30
    @martin: It is not fixed. The value is allowed to change. This is a very practical consideration. A variable can be assigned to a CPU register, but while it is uninitialized (i.e. its effective value-lifetime hasn't begun yet), that same CPU register can be occupied by a different variable. The changes in that other variable will be seen as an "unstable" value of this uninitialized variable. This is something that is *often* observed in practice with uninitialized variables. – AnT stands with Russia Aug 01 '14 at 06:39
  • @AndreyT this is a nice explanation – martin Aug 01 '14 at 06:45
  • 1
    Never mind, found it, my mistake: http://stackoverflow.com/questions/20300665/output-of-the-expression-36aa-in-c-language, and it was in fact for C. – Thomas Aug 01 '14 at 09:03
  • @Thomas Yes, that all seems quite similar. But most of the discussion is in comments there, and the question was about what `^` means, the UB was accidental and incidental to the question. Here the focus of the question is all about the UB. – David Heffernan Aug 01 '14 at 09:05
  • @DavidHeffernan Absolutely, I was not suggesting to close as duplicate (the question linked is fairly low quality anyway), just that it could be worth a look. – Thomas Aug 01 '14 at 09:07
  • @Thomas Thanks. Nice to have some more discussion on the topic from others. Appreciated. – David Heffernan Aug 01 '14 at 09:10
  • Testing for `unsigned int a; printf("%d\n", !a);` would even be closer to the crux of this post by eliminating the multiple access issues. – chux - Reinstate Monica Aug 01 '14 at 13:18
  • Similar: http://stackoverflow.com/questions/11962457/why-is-using-an-uninitialized-variable-undefined-behavior-in-c – M.M Feb 16 '15 at 23:39

3 Answers3

76

In C11:

  • It's explicitly undefined according to 6.3.2.1/2 if a never has its address taken (quoted below)
  • It could be a trap representation (which causes UB when accessed). 6.2.6.1/5:

Certain object representations need not represent a value of the object type.

Unsigned ints can have trap representations (e.g. if it has 15 precision bits and 1 parity bit, accessing a could cause a parity fault).

6.2.4/6 says that the initial value is indeterminate and the definition of that under 3.19.2 is either an unspecified value or a trap representation.

Further: in C11 6.3.2.1/2, as pointed out by Pascal Cuoq:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

This doesn't have the exception for character types, so this clause appears to supersede the preceding discussion; accessing x is immediately undefined even if no trap representations exist. This clause was added to C11 to support Itanium CPUs which do actually have a trap state for registers.


Systems without trap representations: But what if we throw in &x; so that that 6.3.2.1/2's objection no longer applies, and we are on a system that is known to have no trap representations? Then the value is an unspecified value. The definition of unspecified value in 3.19.3 is a bit vague, however it is clarified by DR 451, which concludes:

  • An uninitialized value under the conditions described can appear to change its value.
  • Any operation performed on indeterminate values will have an indeterminate value as a result.
  • Library functions will exhibit undefined behavior when used on indeterminate values.
  • These answers are appropriate for all types that do not have trap representations.

Under this resolution, int a; &a; int b = a - a; results in b having indeterminate value still.

Note that if the indeterminate value is not passed to a library function, we are still in the realm of unspecified behaviour (not undefined behaviour). The results may be weird, e.g. if ( j != j ) foo(); could call foo, but the demons must remain ensconced in the nasal cavity.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • Supposing that we knew there were no trap values, could we argue defined behaviour then? – David Heffernan Aug 01 '14 at 06:40
  • 16
    @DavidHeffernan You **might as well** treat access to indeterminate data as UB, because your compiler might, too, even if there are no trap values. Please see http://blog.frama-c.com/index.php?post/2013/03/13/indeterminate-undefined – Pascal Cuoq Aug 01 '14 at 06:48
  • @Pascal I get that now. That's the final para of Andrey's answer. – David Heffernan Aug 01 '14 at 06:51
  • @DavidHeffernan The examples go as far as `2 * j` being odd, which is slightly worse than even the picture in Andrey's answer, but you get the idea. – Pascal Cuoq Aug 01 '14 at 06:53
  • When the C89 Standard was written, it was expected that implementations would specify many things that the Standard did not, and the authors of the Standard saw no reason to detail all the cases where an actions should be considered defined on implementations that specify certain things (e.g. the fact that "unsigned int" has no trap representations) but undefined on implementations that don't (e.g. where reading an indeterminate bit pattern as an "unsigned int" might yield a trap representation). – supercat Sep 26 '16 at 22:47
33

Yes, it is undefined behavior.

Firstly, any uninitialized variable can have "broken" (aka "trap") representation. Even a single attempt to access that representation triggers undefined behavior. Moreover, even objects of non-trapping types (like unsigned char) can still acquire special platform-dependent states (like NaT - Not-A-Thing - on Itanium) that might appear as a manifestation of their "indeterminate value".

Secondly, an uninitialized variable is not guaranteed to have a stable value. Two sequential accesses to the same uninitialized variable can read completely different values, which is why, even if both accesses in a - a are "successful" (not trapping), it is still not guaranteed that a - a will evaluate to zero.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • 1
    Have you got a citation for that final paragraph? If that is so, then we needn't even consider traps. – David Heffernan Aug 01 '14 at 06:41
  • @Matt McNabb: That applies exclusively to `unsigned char` type - the only type that has no trap representations. So, this is more of an exception from the general case. – AnT stands with Russia Aug 01 '14 at 06:42
  • An example of an actual, real-life trap representation would be great. – hyde Aug 01 '14 at 06:54
  • 2
    @Matt McNabb: Well, this might be an issue that was resolved differently through different vesrions of the language spec. But the resolution for the DR#260 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm) states it clearly an explicitly that variables with indeterminate values can change arbitrarily "by themselves". – AnT stands with Russia Aug 01 '14 at 06:57
  • @ANdreyT that DR is from 2001, however C11 changed things again (as noted in Pascal's link). Maybe Resolution 3 could be considered to still hold though. – M.M Aug 01 '14 at 07:00
  • 4
    @Matt McNabb: DR#451 reasserted essentially the same decisions from DR#260 in both Oct 2013 and Apr 2014 http://www.open-std.org/Jtc1/sc22/WG14/www/docs/dr_451.htm . The commitree response for DR#451 explicitly states "This viewpoint reaffirms the C99 DR260 position" – AnT stands with Russia Aug 01 '14 at 07:03
  • @AndreyT thanks, have updated my post to include DR451 . – M.M Aug 01 '14 at 07:17
  • 1
    @hyde The closest to a trap representation you may have at hand is signaling NaNs. http://en.wikipedia.org/wiki/NaN#Signaling_NaN Otherwise you need to get a computer with explicit parity bits, a sign-magnitude computer where -0 is considered a trap value, or something equally exotic. – Pascal Cuoq Aug 01 '14 at 07:24
  • Concerning "uninitialized variable is not guaranteed to have a stable value". Say code is `unsigned int a, b; b = a; printf("%u %u\n", b^b, b-b);` Here `b` is initialized, but to an unknown value, but would be _stable_. `b = a` may fire a trap. But without a trap, should not the `print` result in "0 0"? – chux - Reinstate Monica Aug 01 '14 at 13:07
  • 1
    @chux: No. There is nothing that restricts *undefined behavior* to "does what you think, but if not, traps". Literally any behavior is permitted. – Ben Voigt Aug 01 '14 at 22:53
2

If an object has automatic storage duration and its address is not taken, attempting to read it will yield Undefined Behavior. Taking the address of such an object and using pointers of type "unsigned char" to read out the bytes thereof it is guaranteed by the Standard to yield a value of type "unsigned char", but not all compilers adhere to the Standard in that regard. ARM GCC 5.1, for example, when given:

  #include <stdint.h>
  #include <string.h>
  struct q { uint16_t x,y; };
  volatile uint16_t zz;
  int32_t foo(uint32_t x, uint32_t y)
  {
    struct q temp1,temp2;
    temp1.x = 3;
    if (y & 1)
      temp1.y = zz;
    memmove(&temp2,&temp1,sizeof temp1);
    return temp2.y;
  }

will generate code that will return x if y is zero, even if x is outside the range 0-65535. The Standard makes clear that unsigned character reads of Indeterminate Value are guaranteed to yield a value within the range of unsigned char, and the behavior of memmove is defined as equivalent to a sequence of character reads and writes. Thus, temp2 should have a value that could be stored into it via sequence of character writes, but gcc is deciding to replace the memmove with an assignment and ignore the fact that code took the address of temp1 and temp2.

Having a means of forcing a compiler to regard a variable as holding a arbitrary value of its type, in cases where any such value would be equally acceptable, would be helpful, but the Standard doesn't specify a clean means of doing so (save for storing some particular value which would work, but often be needlessly slow). Even operations which should logically force a variable to hold a value that would be representable as some combination of bits cannot be relied upon to work on all compilers. Consequently, nothing useful can be guaranteed about such variables.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • To be fair, there is a defect report linked above about exactly _what_ you can do with an indeterminate value, and part of the decision was to specify that passing an indeterminate value to any library function is UB. `memmove` is a library function so that would apply here. – BeeOnRope Sep 08 '17 at 03:36
  • @BeeOnRope: If the authors of the Standard had included a means of resolving indeterminate values into at-worst-unspecified values, it would have been reasonable to require the use of such means before passing otherwise-indeterminate values to library functions. Given the lack of such means, the only thing I can read into their decision is that they are more interested in making a language "easy to optimize" than in maximizing its usefulness. – supercat Sep 08 '17 at 14:19
  • @BeeOnRope: Their rationale is that making behavior undefined shouldn't prevent compilers from defining behaviors when targeting processors and application fields where it would be practical and useful to do so. Unfortunately, whether or not such decisions by the Committee should have such an effect, it's obvious that they do. – supercat Sep 08 '17 at 14:21
  • 1
    I suppose, yes, they could have introduced some kind of `T std::freeze(T v)` method that would turn a "wobbly" indeterminate value into an unspecified-but-stable value. It would have "third order" usefulness though: using indeterminate value is already obscure and very rarely used, so adding a special construct just to solidify such values would seem to be just going further down the rabbit hole of what is already an obscure corner of the standard, and it would have to be supported in the core transformation/optimization phases of many compilers. – BeeOnRope Sep 08 '17 at 19:25
  • @BeeOnRope: The ability to freeze values would have essentially zero cost outside those situations where it would be essential, and trying to debug optimized code in its absence is a sure path to insanity. If one writes `foo=moo; if (foo < 100) bar(foo);` and `moo` gets changed unexpectedly by some other thread, trying to diagnose when and where things went wrong may be essentially impossible. Being able to say `foo=moo; freeze(foo); if (foo < 100) bar(foo);` and have the compiler commit to a value for `foo` would make things a lot more robust. – supercat Sep 08 '17 at 22:47
  • @BeeOnRope: In addition, a fundamental tenet of secure systems programming often requires that behavior be constrained even in cases where client code does things it shouldn't such as passing a pointer to an object while another thread is modifying it. Having code use `freeze` in those particular places it's needed would be much cheaper than using more "sledge-hammer"-ish approaches or having to disable optimizations altogether. – supercat Sep 08 '17 at 22:58
  • I was thinking in the context of a single-thread were indeterminism is introduced, e.g., by reads of uninitialized variables, but sure that's a good point about concurrent writes. In other languages a "non atomic" like that into a local will certainly "lock in" the value. C and C++ are hamstrung by being implemented via compiled-to-native code across a wide variety of architectures. Decisions made to "support" architectures that died (like Itanium) sometimes later look pretty stupid. – BeeOnRope Sep 08 '17 at 23:00
  • ... although on most practical hardware, `std::memory_order_relaxed` gets you exactly what you want. – BeeOnRope Sep 08 '17 at 23:00
  • @BeeOnRope: I find it odd the "Itanium clause" is so called, when the real issue goes back a lot further. On a typical processor that uses 32-bit instructions for everything but loads and stores (e.g. ARM, though the principle goes back far earlier), the simplest code for `volatile uint16_t x; uint16_t test(uint32_t q, int mode) { uint16_t result; if (mode) result=x; return result; }` would return `q` if `mode` is zero, even if `q` is greater than 65535. – supercat Sep 09 '17 at 00:09
  • @BeeOnRope: If nothing uses the result of such a function, that won't matter. Having a `uint16_t` hold values greater than 65,535, however, could cause weird behaviors downstream. A `freeze` could be used to guard against such things by forcing a `uint16_t` object to be clipped to that range. – supercat Sep 09 '17 at 00:12
  • @BeeOnRope Re: "wobbly": +1. See relevant DRs: [451](https://www.open-std.org/Jtc1/sc22/WG14/www/docs/dr_451.htm), and (extra) [260](https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm). – pmor Sep 15 '22 at 19:01
  • @pmor: A fundamental problem with the Standard, which is responsible for most controversies related to it, is that it fails to accommodate the idea of optimizations which may cause behavior to be inconsistent, in *limited* ways, with a sequential-execution model, or to recognize that some tasks would require behavioral guarantees that are stronger than would be needed for other tasks. Classifying an action whose behavior would otherwise be defined as "anything can happen" UB only allows more optimizations than would allowing more limited freedom in cases where nothing a program might do... – supercat Sep 15 '22 at 20:29
  • ...in response to even maliciously-contrived input would be considered unacceptable. While there are some tasks that will either be completely shielded from potentially malicious inputs, or run in an environment where nothing they could do would cause unacceptable harm, such tasks represent an extreme minority of the tasks for which people use the C language. Granting compilers more limited freedom to deviate from a "precise sequential execution" model would vastly increase the range of *correct* programs where compilers would be able to benefit from optimization-friendly rules. – supercat Sep 15 '22 at 20:32