-1

Sometimes in C it is necessary to read a possibly-written item from a partially-written array, such that:

  1. If the item has been written, the read will yield the value that was in fact written, and

  2. If the item hasn't been written, the read will convert an Unspecified bit pattern to a value of the appropriate type, with no side-effects.

There are a variety of algorithms where finding a solution from scratch is expensive, but validating a proposed solution is cheap. If an array holds solutions for all cases where they have been found, and arbitrary bit patterns for other cases, reading the array, testing whether it holds a valid solution, and slowly computing the solution only if the one in the array isn't valid, may be a useful optimization.

If an attempt to read a non-written array element of a types like uint32_t could be guaranteed to always yield a value of the appropriate type, efficiently such an approach would be easy and straightforward. Even if that requirement only held for unsigned char, it might still be workable. Unfortunately, compilers sometimes behave as though reading an Indeterminate Value, even of type unsigned char, may yield something that doesn't behave consistently as a value of that type. Further, discussions in a Defect Report suggest that operations involving Indeterminate values yield Indeterminate results, so even given something like unsigned char x, *p=&x; unsigned y=*p & 255; unsigned z=(y < 256); it would be possible for z to receive the value 0.

From what I can tell, the function:

unsigned char solidify(unsigned char *p)
{
  unsigned char result = 0;
  unsigned char mask = 1;
  do
  {
    if (*p & mask) result |= mask;
    mask += (unsigned)mask; // Cast only needed for capricious type ranges
  } while(mask);
  return result;
}

would be guaranteed to always yield a value in the range of type unsigned char any time the storage identified can be accessed as that type, even if it happens to hold Indeterminate Value. Such an approach seems rather slow and clunky, however, given that the required machine code to obtain the desired effect should usually be equivalent to returning x.

Are there any better approaches that would be guaranteed by the Standard to always yield a value within the range of unsigned char, even if the source value is Indeterminate?

Addendum

The ability to solidify values is necessary, among other things, when performing I/O with partially-written arrays and structures, in cases where nothing will care about what bits get output for the parts that were never set. Whether or not the Standard would require that fwrite be usable with partially-writtten structures or arrays, I would regard I/O routines that can be used in such fashion (writing arbitrary values for portions that weren't set) to be of higher quality than those which might jump the rails in such cases.

My concern is largely with guarding against optimizations which are unlikely to be used in dangerous combinations, but which could nonetheless occur as compilers get more and more "clever".

A problem with something like:

unsigned char solidify_alt(unsigned char *p)
{ return *p; }

is that compilers may combine an optimization which could be troublesome but tolerable in isolation, with one that would be good in isolation but deadly in combination with the first:

  1. If the function is passed the address of an unsigned char which has been optimized to e.g. a 32-bit register, a function like the above may blindly return the contents of that register without clipping it to the range 0-255. Requiring that callers manually clip the results of such functions would be annoying but survivable if that were the only problem. Unfortunately...

  2. Since the above function function will "always" return a value 0-255, compilers may omit any "downstream" code that would try to mask the value into that range, check if it was outside, or otherwise do things that would be irrelevant for values outside the range 0-255.

Some I/O devices may require that code wishing to write an octet perform a 16-bit or 32-bit store to an I/O register, and may require that 8 bits contain the data to be written and other bits hold a certain pattern. They may malfunction badly if any of the other bits are set wrong. Consider the code:

void send_byte(unsigned char *p, unsigned int n)
{
  while(n--)
    OUTPUT_REG = solidify_alt(*p++) | 0x0200;
}
void send_string4(char *st)
{
  unsigned char buff[5]; // Leave space for zero after 4-byte string
  strcpy((char*)buff, st);
  send_bytes(buff, 4);
}

with the indended semantics that send_string4("Ok"); should send out an 'O', a 'k', a zero byte, and an arbitrary value 0-255. Since the code uses solidify_alt rather than solidify, a compiler could legally turn that into:

void send_string4(char *st)
{
  unsigned buff0, buff1, buff2, buff3;
  buff0 = st[0]; if (!buff0) goto STRING_DONE;
  buff1 = st[1]; if (!buff1) goto STRING_DONE;
  buff2 = st[2]; if (!buff2) goto STRING_DONE;
  buff3 = st[3];
 STRING_DONE:
  OUTPUT_REG = buff0 | 0x0200;
  OUTPUT_REG = buff1 | 0x0200;
  OUTPUT_REG = buff2 | 0x0200;
  OUTPUT_REG = buff3 | 0x0200;
}

with the effect that OUTPUT_REG may receive values with bits set outside the proper range. Even if output expression were changed to ((unsigned char)solidify_alt(*p++) | 0x0200) & 0x02FF) a compiler could still simplify that to yield the code given above.

The authors of the Standard refrained from requiring compiler-generated initialization of automatic variables because it would have made code slower in cases where such initialization would be semantically unnecessary. I don't think they intended that programmers should have to manually initialize automatic variables in cases where all bit patterns would be equally acceptable.

Note, btw, that when dealing with short arrays, initializing all the values will be inexpensive and would often be a good idea, and when using large arrays a compiler would be unlikely to impose the above "optimization". Omitting the initialization in cases where the array is large enough that the cost matters, however, would make the program's correct operation reliant upon "hope".

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Out of curiosity, what is an example of when this is necessary? – Oliver Charlesworth Dec 02 '17 at 18:52
  • 5
    I feel like it would take a really extreme case for this to be worth it over just zeroing the array before you start to fill in solutions. That said, it's still pretty frustrating how difficult and error-prone the standard makes this. – user2357112 Dec 02 '17 at 18:59
  • 1
    What advantage do you see with `solidify()` vs. `unsigned char solidify_alt(unsigned char *p) { unsigned char x = *p; return x; }`? – chux - Reinstate Monica Dec 02 '17 at 19:08
  • @chux: If `p` is the address of an object holding Indeterminate Value, the `solidify_alt` function would likewise yield Indeterminate Value, rather than a number 0-255. See the addendum above. – supercat Dec 02 '17 at 20:17
  • @user2357112: The problem, fundamentally, is that the authors of C89 didn't think it was necessary to explicitly define corner cases where there was an obvious useful behavior, all compilers to date had implemented it, and there was no reason to expect compilers to do anything else. Unfortunately, even when developments in compiler technology allowed compilers to benefit from contrary behaviors, the authors of the Standard pretended the behaviors had never been defined, rather than saying that e.g. if code contains `#pragma __STDC_LANGUAGE_STD(99)`, it must use `__STDC_SOLIDIFY()` on any... – supercat Dec 02 '17 at 20:23
  • ...lvalue that holds Indeterminate Value before attempting to read it, and that if code contains `__STDC_LANGUAGE_STD(89)`, any attempt to read an Indeterminate Value must have a behavior consistent with reading an object whose storage contains some possibly-arbitrary bit pattern. Unfortunately, the people responsible for the evolution of C are unwilling to address the massive technical debt the language has accrued as a result of their insistence upon ignoring problems. – supercat Dec 02 '17 at 20:26
  • @user2357112: With regard to why one wouldn't simply clear the destination storage, that would in many cases not be expensive *but* a quality function to do something like "write a range of bytes to some I/O device" shouldn't impose such demands upon client code without a really good reason for doing so. The cost of `solidify` as written is sufficiently monstrous I can't imagine realistically using it in production code, but I'd suggest that quality library functions for I/O should behave as though they "solidify" the data to be stored. The question then is, in part, what... – supercat Dec 02 '17 at 20:36
  • ...quality code should do to ensure the appropriate behavior. – supercat Dec 02 '17 at 20:36
  • 2
    The committee response to [DR 451](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1892.htm#dr_451) says that "[...] any operation performed on indeterminate values will have an indeterminate value as a result." Thus the view of the committee is that there is **no way** of "determining" an indeterminate value, not even the one proposed in the question. – Antti Haapala -- Слава Україні Dec 02 '17 at 21:52
  • @AnttiHaapala: An excellent example how separated from reality the committee members' interests are. In actual reality, such an operation is trivial to implement as a compiler operation (as opposed to e.g. library function): if the data referenced has no trap representations, then simply treat it as if it was initialized to random data. If the data referenced has trap representations, the compiler can either issue a warning or error and fail. Necessary behaviour would be achieved, and no existing behaviour would change. – Nominal Animal Dec 02 '17 at 22:45
  • @AnttiHaapala: If a system uses N-bit `unsigned char`, the code in the question will use the object whose address is passed within the conditional expressions of N `if` statements, and for no other purpose. Objects `mask` and `result` are initialized to well-defined values, and no computations on them involve anything else. If the code would invoke UB, where? If not, how could `result` be anything other than a value determined by the results of `if` tests? – supercat Dec 02 '17 at 22:58
  • 1
    Question without any real merit. More so when talking about PODs for IO. The last example tries to be clever and show that indeterminate and uninitialized are two different things. However, the type conversion magically switched places to before even first use of `buff`, instead of promotion during the arithmetic operation when assigning `OUTPUT_REG`. Author is confused why compilers track uninitialized variables to do optimizations. Quality code would exactly not need `solidify`, because it wouldn't invoke undefined behavior in the first place. – FRob Dec 03 '17 at 02:20
  • @FRob: Perhaps I should reformulate the example to outputting a sequence of bytes as hex digits, with any uninitialized portions being required to output possibly-arbitrary digits '0'-'F'. I understand that it's useful to be able to allow uninitialized variables to yield non-deterministic values even after they are copied, *provided* there's a practical means by which code can force the compiler to make a solid concrete choice about an object's value in cases where correct semantics would require it. The Standard seems to allow an absurdly impractical means--does it provide a better one? – supercat Dec 03 '17 at 06:51
  • I'm not sure which DR you refers to but there was one about regarding whether successive reads of an indeterminate value could yield different results, which is a different story. – Lundin Dec 04 '17 at 12:56
  • @Lundin: Having individual reads yield different results would indeed be a different issue, and one which could be dealt with easily, but I recall another discussion (which I think was a DR, though I might be misremembering) of whether e.g. `unsigned char x=[Indeterminate value of type "unsigned char"]; y=x-x;` should be required to yield zero, and the conclusion was that there is no guarantee that the value stored in `x` would be Unspecified rather than Indeterminate. I think a fundamental conflict between the Standard and present compiler designs is that... – supercat Dec 04 '17 at 15:50
  • @Lundin: ...compilers have no good way of treating `*p = *p;` as an "almost-no-op", and in many application fields treating it as a genuine no-op that can be optimized away would be safe and useful. The solution I'd like to see would be for the Standard to define an intrinsic to make an lvalue be at worst an unspecified set of bit values, and have a directives which would invite or forbid a compiler from treating all reads of Indeterminate Value as yielding Indeterminate Value, without any "character-type" exception, in all cases where code hasn't used the aforementioned intrinsic. – supercat Dec 04 '17 at 16:03
  • @Lundin: What irks me is that compiler writers seem more interested in trying to interpret the Standard in a way that would break existing code, rather than in providing ways by which programmers can indicate which optimizations are safe and which aren't. If an optimization would be safe in 95% of the places it could be applied, having a means of marking the remaining 5% and then specifying that it's safe in all the places that aren't marked would seem like it should be better in every way than twisting the Standard to allow the optimization in as many cases as possible. – supercat Dec 04 '17 at 16:12

2 Answers2

1

This is not an answer, but an extended comment.

The immediate solution would be for the compiler to provide a built-in, for example assume_initialized(variable [, variable ... ]*), that generates no machine code, but simply makes the compiler treat the contents of the specified variable (either scalars or arrays) to be defined but unknown.

One can achieve a similar effect using a dummy function defined in another compilation unit, for example

void define_memory(void *ptr, size_t bytes)
{
    /* Nothing! */
}

and calling that (e.g. define_memory(some_array, sizeof some_array)), to stop the compiler from treating the values in the array as indeterminate; this works because at compile time, the compiler cannot determine the values are unspecified or not, and therefore must consider them specified (defined but unknown).

Unfortunately, that has serious performance penalties. The call itself, even though the function body is empty, has a performance impact. However, worse yet is the effect on the code generation: because the array is accessed in a separate compilation unit, the data must actually reside in memory in array form, and thus typically generates extra memory accesses, plus restricts the optimization opportunities for the compiler. In particular, even a small array must then exist, and cannot be implicit or reside completely in machine registers.

I have experimented with a few architecture (x86-64) and compiler (GCC) -specific workarounds (using extended inline assembly to fool the compiler to believe that the values are defined but unknown (unspecified, as opposed to indeterminate), without generating actual machine code -- because this does not require any machine code, just a small adjustment to how the compiler treats the arrays/variables --, but with about zero success.

Now, to the underlying reason why I wrote this comment.

Years and years ago, working on numerical computation code and comparing performance to a similar implementation in Fortran 95, I discovered the lack of a memrepeat(ptr, first, bytes) function: the counterpart to memmove() with respect to memcpy(), that would repeat first bytes at ptr to ptr+first up to ptr+bytes-1. Like memmove(), it would work on the storage representation of the data, so even if the ptr to ptr+first contained a trap representation, no trap would actually trigger.

Main use case is to initialize arrays with floating-point data (one-dimensional, multidimensional, or structures with floating-point members), by initializing the first structure or group of values, and then simply repeating the storage pattern over the entire array. This is a very common pattern in numerical computation.

As an example, using

    double nums[7] = { 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0 };
    memrepeat(nums, 2 * sizeof nums[0], sizeof nums);

yields

    double nums[7] = { 7.0, 6.0, 7.0, 6.0, 7.0, 6.0, 7.0 };

(It is possible that the compiler could optimize the operation even better, if it was defined as e.g. memsetall(data, size, count), where size is the size of the duplicated storage unit, and count the total number of storage units (so count-1 units are actually copied). In particular, this allows easy implementation that uses nontemporal stores for the copies, reading from the initial storage unit. On the other hand, memsetall() can only copy full storage units unlike memrepeat(), so memsetall(nums, 2 * sizeof nums[0], 3); would leave the 7th element in nums[] unchanged -- i.e., in the above example, it'd yield { 7.0, 6.0, 7.0, 6.0, 7.0, 6.0, 1.0 }.)

Although you can trivially implement memrepeat() or memsetall(), even optimize them for a specific architecture and compiler, it is difficult to write a portable optimized version.

In particular, loop-based implementations that use memcpy() (or memmove()) yield quite inefficient code when compiled by e.g. GCC, because the compiler cannot coalesce a pattern of function calls into a single operation.

Most compilers often inline memcpy() and memmove() with internal, target-and-use-case-optimized versions, and doing that for such a memrepeat() and/or memsetall() function would make it portable. In Linux on x86-64, GCC inlines known-size calls, but keeps the function calls where the size is only known at runtime.

I did try to push it upstream, with some private and some public discussions on various mailing lists. The response was cordial, but clear: there is no way to get such features included into compilers, unless it is standardized by someone first, or you pique the interest of one of the core developers enough so that they want to try it themselves.

Because the C standards committee is only concerned at fulfilling the commercial interests of its corporate sponsors, there is zero chance of getting anything like that standardized into ISO C. (If there were, we really should push for basic features from POSIX like getline(), regex, and iconv to be included first; they'd have a much bigger positive impact on code we can teach new C programmers.)

None of this piqued the interest of the core GCC developers either, so at that point, I lost my interest in trying to push it upstream.

If my experience is typical -- and discussing it with a few people it does seem like it is --, OP and others worrying about such things will better utilize their time to find compiler/architecture-specific workarounds, rather than point out the deficiencies in the standard: the standard is already lost, those people do not care.

Better spend your time and efforts in something you can actually accomplish without having to fight against windmills.

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
  • From a practical standpoint, what's probably needed is for a group of compiler vendors to formally recognize a "common-sense C" and encourage programmers to target that dialect and specify that their code should not be expected to work on compilers whose writers think "clever" implies "not stupid". Use of compilers which don't promise to uphold the guarantees should be highly discouraged in any cases involving safety or security. The language "C, with Undefined Behavior behaving in the 'natural' fashion for the underlying platform when practical" is good and useful. The subset... – supercat Dec 02 '17 at 23:19
  • ...of "freestanding C" which is limited to behaviors mandated by the Committee is horribly weak, since the authors of C89 and C99 thought they could rely upon compiler writers to use common sense. BTW, I wonder if the people driving C into crazy-town actually like C, or if they're *trying* to destroy it? I think Apple is a major driving force behind both "clang" and the Swift language, for example. – supercat Dec 02 '17 at 23:24
  • @supercat: Getting there (convincing vendors) is .. well, *I* won't bother trying to achieve that. On the other hand, there are lots of examples from LKML where developers participating in both GCC and Linux kernel development have fixed idiocies in GCC that are technically allowed by the standard. I'd say bribing/paying GCC developers to implement some of these would be a much more achievable route -- especially because there would be no ethics violations, as we're talking about adding useful stuff that the standard leaves undefined. Unfortunately, I'm broke as. – Nominal Animal Dec 03 '17 at 00:01
  • @supercat: As to vendors, PathScale is basically gone; Portland Group does HPC stuff and doesn't seem to bother with standards' folks much; Intel, GCC, clang folks do not care much about the C library side (and are happy to merge C++ features to C); and Microsoft is only interested in keeping its C and C++ source just enough unportable, to keep the developers from generating portable code and thus tie them in to their own walled garden (EEE, remember?). In summary: Yes, they would like for C to merge with C++. – Nominal Animal Dec 03 '17 at 00:08
  • Your `memrepeat` is exactly what `memmove` does. `memmove` doesn't know the pointer target type, so it cannot trigger a trap... – FRob Dec 03 '17 at 02:00
  • @Frob: Like I said, `memrepeat()` is the counterpoint of `memmove()`. Note that it may copy `ptr[0..first-1]` more than once. That is exactly the point. (That is also the difficulty in implementing an optimized version of it: on some architectures, you might be better off copying the first part with nontemporal stores; on some, it might be best to copy `ptr[i]` to `ptr[i+first]` until `i+first == bytes`. It varies, but the compiler already knows, because it implements `memcpy()`/`memmove()`. It would be trivial for the compiler to implement `memrepeat()`, too.) – Nominal Animal Dec 03 '17 at 02:46
  • @FRob: I think the idea with `memrepeat` was that e.g. `memrepeat(charptr1, charptr1+8, 256)` would be defined as equivalent to `for (i=0; i<256; i++) charptr1[i+8] = charptr[i];`, with later values being repeated copies of earlier ones. It's possible to achieve the required behavior with lgN extra overhead by copying 8 bytes from charptr to charptr+8, then 16 bytes from charptr to charptr+16, etc. so the lack of a standard-library means of doing that isn't a total klller, but such a function would still be far more useful than silly things like strncat. – supercat Dec 03 '17 at 06:38
  • @supercat: For `memrepeat(charptr, first, bytes)` consider `for (i = first; i < bytes; i++) charptr[i] = charptr[i - first];` or `for (i = 0; i < bytes - first; i++) charptr[i + first] = charptr[i % first];`. But both of those compile to pretty poor code on GCC; using pointers instead helps. – Nominal Animal Dec 03 '17 at 08:03
  • @NominalAnimal: If one has a region of size `sz` at address `p`, and wants the copy staggered by `ofs` bytes, then as long as `ofs < len>>1`, repeat `memcpy(p+ofs, p, ofs); ofs<<=1;`. Then end with `memcpy(p+ofs, p, n-ofs)`. For small regions, manually copying bytes may be better, but the approach I describe should work well with large regions. – supercat Dec 03 '17 at 08:51
  • `memrepeat(ptr, first, bytes)` is `memmove(ptr, ptr + first, bytes - first)`. You seem to want to complicate things for some reason. – FRob Dec 03 '17 at 12:47
  • @FRob: Given `unsigned char arr[6]={0,1,2,3,4,5};`, `memmove(arr,arr+2,4);` the array would be left holding {2,3,4,5,4,5}; using `memmove(arr+2,arr,4)` would leave it holding {0,1,0,1,2,3}. I think the purpose of `memrepeat` would be to leave it holding {0,1,0,1,0,1}. – supercat Dec 03 '17 at 17:15
  • @FRob: It looks like you completely misunderstood my concept. I've edited the answer to describe it in practical terms. – Nominal Animal Dec 03 '17 at 17:57
  • @supercat: repeating `memcpy()` generates relatively poor code on e.g. GCC-5.4 on x86-64. The compiler cannot combine the calls. If the storage unit size is `first` and is known at compile time, as it almost always is, the generated code is typically a double loop, and it is best to copy from the initial unit in the array (as opposed to from the previously stored unit). The data is usually aligned to `unsigned long`, and on architectures without trap representations in those, a simple staggered copy loop is pretty good -- better than `memcpy()` loop -- but we can usually do even better. – Nominal Animal Dec 03 '17 at 18:03
  • @NominalAnimal: Done as I propose, the number of `memcpy` operations will be lg(size/offset)+1. If one wants to e.g. repeat a 4-byte pattern through a 65536-byte range, that would require fourteen memcpy operations, four of 32 bytes or smaller and ten of 64 bytes or larger. If aliasing settings will allow user code to employ chunking optimizations (e.g. read groups of 8 bytes using a uint64_t* without regard for Effective Type nonsense), loops exploiting that one could replace profitably replace at least the first four `memcpy` operations with a loop. If chunking isn't possible... – supercat Dec 03 '17 at 18:58
  • ...it may be possible for `memcpy`, even with its startup overhead, to do better than a `char`-based loop even for smaller sizes like 16 or 32 bytes. It's too bad that while the rationale indicates the authors of the Standard meant to say that failure to recognize aliasing *in cases where a compiler would have no reason to expect it* does not render a compiler non-conforming, they instead wrote some absurdly overbearing "shall not" language which, linguistically, branded huge amounts of code as "broken" [rather than merely saying that compilers are not *required* to support it... – supercat Dec 03 '17 at 19:08
  • ...but quality compilers suitable for low-level programming should nonetheless try to do so anyway when practical]. It should be very easy for compilers to support the most common patterns surrounding chunking optimizations. The authors of the Standard clearly understood the natural meaning of using lvalue of one type is used to access another (the rationale notes that an optimization assuming a `double*` wouldn't modify a static object of type `int` would only be "incorrect" under circumstances too rare to justify branding such optimizations as non-conforming, but saying it is... – supercat Dec 03 '17 at 19:31
  • ..."incorrect" would imply that such code would have a correct meaning in the absence of such optimization). – supercat Dec 03 '17 at 19:32
  • @supercat: Doubling the memory range using `memcpy()` is very fast in microbenchmarks, but it's horrible wrt. cache behaviour, especially in cases where the surrounding code is cache-intensive as most numerical computations tend to be. (In the case where the compiler sees that the data is not accessed soon, but the operation is in the middle of data-intensive computation, it could use here nontemporal stores if supported by the architecture.) Even the simple copy loop may lead to better real-life results, outside microbenchmarks, due to lesser cache effects. – Nominal Animal Dec 04 '17 at 00:40
  • @NominalAnimal: BTW, I think a bigger omission with regard to `memcpy`-ish issues is the lack of `memcpy`/`memmove` variants which could use the types of the source and destination pointers (immediately prior to the argument conversion to `void*`) to make type-based inferences, as well as variants which are intended for data conversions, and which would require compilers to recognize potential aliasing of source and destination with objects of all types. – supercat Dec 04 '17 at 02:51
  • @NominalAnimal: There's probably an awkward range of sizes where which caching behavior would be sub-optimal using the doubling algorithm, but even then I would expect performance to still be reasonably decent. More importantly, even if the code needed to be tweaked to yield optimal performance on a particular platform, it should work *correctly* on all platforms. – supercat Dec 04 '17 at 02:54
0

I think this is pretty clear. C11 3.19.2

indeterminate value
either an unspecified value or a trap representation

Period. It cannot be anything else than the two cases above.

So code such as unsigned z=(y < 256) can never return 0, because x in your example cannot hold a value larger than 255. As per the representation of character types, 6.2.6, an unsigned char is not allowed to contain padding bits or trap representations.

Other types, on wildly exotic systems, could in theory hold values outside their range, padding bits and trap representations.

On real-world systems, that are extremely likely to use two's complement, trap representations do not exist. So the indeterminate value can only be unspecified. Unspecified, not undefined! There is a myth saying that "reading an indeterminate value is always undefined behavior". Save for trap representations and some other special cases, this is not true, see this. This is merely unspecified behavior.

Unspecified behavior does not mean that the compiler can run havoc and make weird assumptions, as it can when it encounters undefined behavior. It will have to assume that the variable values are in range. What the compiler cannot assume, is that the value is the same between reads - this was addressed by some DR.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Given `short test(unsigned mode) { register short x; if (!mode) x=someVolatile; return x; }`, the most straightforward non-"optimized" code for a platform like the ARM would pick a register for x, conditionally load a volatile value into it with sign extension, and then return the contents of that register (returning whatever 32-bit value happened to be in it if it wasn't set). I don't think the authors of any version of the Standard intended to forbid such behavior [I don't think assigning the first few `register`-qualified variables to registers would have been considered "optimization"]. – supercat Dec 04 '17 at 20:17
  • Where things get really tricky is with code like `register short y=test(1234567); if (y >= 0 && y < 32768) foo[y]++;`. I don't think any version of the Standard was intended to forbid `test` from returning an arbitrary 32-bit value, nor from having `y` receive such a value, nor from having a compiler blindly assume that any short (including `y`) will always compare less than 32768. Saying that use of an uninitialized automatic variable is UB is a lot simpler than trying to identify all the allowable consequences of using one. – supercat Dec 04 '17 at 20:33