2

I have a binary file in big-endian format from which I am retrieving 2-bit and 4-bit integer data. The machine I'm running on is little-endian.

Does anyone have any suggestions or a best-practice on pulling integer data from a known format binary and switching endianness on the fly? I'm not sure that my current solution is even correct:

int myInt;

ifstream dataFile(dataFileLocation, ios::in | ios::binary);
dataFile.seekg(99, ios::beg);  //Pull data starting at byte 100;

//For 4-byte value:
char chunk[4];
dataFile.read(chunk, 4);
myInt = (int)(chunk[0] << 24 | chunk[1] << 16 | chunk[2] << 8 | chunk[3]);

//For 2-byte value:
char chunk[2];
dataFile.read(chunk, 4);
myInt = (int)(chunk[0] << 8 | chunk[1]);

This seems to work fine for 2-byte data but gives what I believe are incorrect values on 4-byte data. I've read about htonl() but from what I've read that's not a smart way to go for flexibility.

TheOx
  • 2,208
  • 25
  • 28
  • Shifting a `char` over more than 7 bits will make it completely zero, which is what you're doing on every one but `chunk[3]`. I don't know the right way to do this, just pointing one reason out why yours might not be working. – Seth Carnegie Nov 18 '11 at 19:57
  • 1
    check this: http://stackoverflow.com/questions/105252/how-do-i-convert-between-big-endian-and-little-endian-values-in-c/105339#105339 – perreal Nov 18 '11 at 19:58
  • Your approach is fine. Note that the endianness of your machine is irrelevant. I would read into an unsigned int and unsigned chars, by the way, to avoid ambiguities in the shifting. – Kerrek SB Nov 18 '11 at 19:59
  • My first comment is incorrect, as Kerrek SB has told me, shifting a char results in an int so like he said, this approach is fine. – Seth Carnegie Nov 18 '11 at 20:04
  • @perreal +1 -- nice reference. DTJohn, if you do a little experiment with a `union unsigned char[4]` with `uint32_t` you may be surprised. Print out the numbers in the char & the int; IIRC, the bytes are not swapped in perfect reverse order. – John Price Nov 18 '11 at 21:22

1 Answers1

5

Use unsigned integral types only and you'll be fine:

unsigned char buf[4];
infile.read(reinterpret_cast<char*>(buf), 4);

unsigned int b4 = (buf[0] << 24) + ... + (buf[3]);
unsigned int b2 = (buf[0] << 8) + (buf[1]);

Shifting involves type promotions, and indefinite sign extensions (given the implementation-defined nature of char). Basically you always want everything to be unsigned when manipulating bits.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 1
    Does shifting a `char` over convert it to an `int` or something? – Seth Carnegie Nov 18 '11 at 20:02
  • 1
    +1. Using the bitwise or operator is more idiomatic than using '+', though both ways work. – bames53 Nov 18 '11 at 20:13
  • 2
    @bames53: I like to think of C++ only dealing with *values*, not with *representations*. (I think the authors agree.) Hence I prefer "algebraic" operations to bitwise ones as much as possible. This goes hand in hand with the use of the algebraic char types (`signed char`/`unsigned char`) as opposed to the "platform's unit" `char` type. This viewpoint also emphasizes that endianness is purely a property of an external representation, not of *values*. – Kerrek SB Nov 18 '11 at 20:17