Sometimes in C it is necessary to read a possibly-written item from a partially-written array, such that:
If the item has been written, the read will yield the value that was in fact written, and
If the item hasn't been written, the read will convert an Unspecified bit pattern to a value of the appropriate type, with no side-effects.
There are a variety of algorithms where finding a solution from scratch is expensive, but validating a proposed solution is cheap. If an array holds solutions for all cases where they have been found, and arbitrary bit patterns for other cases, reading the array, testing whether it holds a valid solution, and slowly computing the solution only if the one in the array isn't valid, may be a useful optimization.
If an attempt to read a non-written array element of a types like uint32_t
could be guaranteed to always yield a value of the appropriate type, efficiently such an approach would be easy and straightforward. Even if that requirement only held for unsigned char
, it might still be workable. Unfortunately, compilers sometimes behave as though reading an Indeterminate Value, even of type unsigned char
, may yield something that doesn't behave consistently as a value of that type. Further, discussions in a Defect Report suggest that operations involving Indeterminate values yield Indeterminate results, so even given something like unsigned char x, *p=&x; unsigned y=*p & 255; unsigned z=(y < 256);
it would be possible for z
to receive the value 0
.
From what I can tell, the function:
unsigned char solidify(unsigned char *p)
{
unsigned char result = 0;
unsigned char mask = 1;
do
{
if (*p & mask) result |= mask;
mask += (unsigned)mask; // Cast only needed for capricious type ranges
} while(mask);
return result;
}
would be guaranteed to always yield a value in the range of type unsigned char
any time the storage identified can be accessed as that type, even if it happens to hold Indeterminate Value. Such an approach seems rather slow and clunky, however, given that the required machine code to obtain the desired effect should usually be equivalent to returning x
.
Are there any better approaches that would be guaranteed by the Standard to always yield a value within the range of unsigned char
, even if the source value is Indeterminate?
Addendum
The ability to solidify values is necessary, among other things, when performing I/O with partially-written arrays and structures, in cases where nothing will care about what bits get output for the parts that were never set. Whether or not the Standard would require that fwrite
be usable with partially-writtten structures or arrays, I would regard I/O routines that can be used in such fashion (writing arbitrary values for portions that weren't set) to be of higher quality than those which might jump the rails in such cases.
My concern is largely with guarding against optimizations which are unlikely to be used in dangerous combinations, but which could nonetheless occur as compilers get more and more "clever".
A problem with something like:
unsigned char solidify_alt(unsigned char *p)
{ return *p; }
is that compilers may combine an optimization which could be troublesome but tolerable in isolation, with one that would be good in isolation but deadly in combination with the first:
If the function is passed the address of an
unsigned char
which has been optimized to e.g. a 32-bit register, a function like the above may blindly return the contents of that register without clipping it to the range 0-255. Requiring that callers manually clip the results of such functions would be annoying but survivable if that were the only problem. Unfortunately...Since the above function function will "always" return a value 0-255, compilers may omit any "downstream" code that would try to mask the value into that range, check if it was outside, or otherwise do things that would be irrelevant for values outside the range 0-255.
Some I/O devices may require that code wishing to write an octet perform a 16-bit or 32-bit store to an I/O register, and may require that 8 bits contain the data to be written and other bits hold a certain pattern. They may malfunction badly if any of the other bits are set wrong. Consider the code:
void send_byte(unsigned char *p, unsigned int n)
{
while(n--)
OUTPUT_REG = solidify_alt(*p++) | 0x0200;
}
void send_string4(char *st)
{
unsigned char buff[5]; // Leave space for zero after 4-byte string
strcpy((char*)buff, st);
send_bytes(buff, 4);
}
with the indended semantics that send_string4("Ok"); should send out an 'O', a 'k', a zero byte, and an arbitrary value 0-255. Since the code uses solidify_alt
rather than solidify
, a compiler could legally turn that into:
void send_string4(char *st)
{
unsigned buff0, buff1, buff2, buff3;
buff0 = st[0]; if (!buff0) goto STRING_DONE;
buff1 = st[1]; if (!buff1) goto STRING_DONE;
buff2 = st[2]; if (!buff2) goto STRING_DONE;
buff3 = st[3];
STRING_DONE:
OUTPUT_REG = buff0 | 0x0200;
OUTPUT_REG = buff1 | 0x0200;
OUTPUT_REG = buff2 | 0x0200;
OUTPUT_REG = buff3 | 0x0200;
}
with the effect that OUTPUT_REG may receive values with bits set outside the proper range. Even if output expression were changed to ((unsigned char)solidify_alt(*p++) | 0x0200) & 0x02FF)
a compiler could still simplify that to yield the code given above.
The authors of the Standard refrained from requiring compiler-generated initialization of automatic variables because it would have made code slower in cases where such initialization would be semantically unnecessary. I don't think they intended that programmers should have to manually initialize automatic variables in cases where all bit patterns would be equally acceptable.
Note, btw, that when dealing with short arrays, initializing all the values will be inexpensive and would often be a good idea, and when using large arrays a compiler would be unlikely to impose the above "optimization". Omitting the initialization in cases where the array is large enough that the cost matters, however, would make the program's correct operation reliant upon "hope".