My question is similar to this but a bit more specific. I am writing a function to read a 32-bit unsigned integer from a istream represented using little endian. In C something like this would work:
#include <stdio.h>
#include <inttypes.h>
uint_least32_t foo(FILE* file)
{
unsigned char buffer[4];
fread(buffer, sizeof(buffer), 1, file);
uint_least32_t ret = buffer[0];
ret |= (uint_least32_t) buffer[1] << 8;
ret |= (uint_least32_t) buffer[2] << 16;
ret |= (uint_least32_t) buffer[3] << 24;
return ret;
}
But if I try to do something similar using a istream
I run into what I think is undefined behaviour
uint_least32_t bar(istream& file)
{
char buffer[4];
file.read(buffer, sizeof(buffer));
// The casts to unsigned char are to prevent sign extension on systems where
// char is signed.
uint_least32_t ret = (unsigned char) buffer[0];
ret |= (uint_least32_t) (unsigned char) buffer[1] << 8;
ret |= (uint_least32_t) (unsigned char) buffer[2] << 16;
ret |= (uint_least32_t) (unsigned char) buffer[3] << 24;
return ret;
}
It is undefined behaviour on systems where char is signed and there isn't two's complement and it cannot represent the number -128, so it can't represent 256 different chars. In foo
it will work even if char is signed because section 7.21.8.1 of the C11 standard (draft N1570) says that fread
uses unsigned char
not char
and unsigned char
has to be able to represent all values in the range 0 to 255 inclusive.
Does bar
really cause undefined behavior when tries to read the number 0x80
and if so is there a workaround still using a std::istream
?
Edit: The undefined behaviour I am referring to is caused by the istream::read
into buffer
not the cast from buffer to unsigned char. For example if it is a sign+magnitude machine and char is signed then 0x80 is negative 0, but negative 0 and positive 0 must always compare equal according to the standard. If that is the case then there are only 255 different signed chars and you cannot represent a byte with a char. The casts will work because it will always add UCHAR_MAX + 1
to negative numbers (section 4.7 of draft C++11 standard N3242) when casting signed to unsigned.