convert chars to integer correct way

Question

Just trying to make sure I got it right. On SO I encountered an answer on a question: how to store chars in int like this:

unsigned int final = 0;
final |= ( data[0] << 24 );
final |= ( data[1] << 16 );
final |= ( data[2] <<  8 );
final |= ( data[3]       );

But to my understanding this is wrong isn't it? Why: say data has stored the integer in little endian way (e.g., data[0]=LSB_ofSomeInt). Then if machine executing above code is little endian, final will hold correct value, else if the machine running above code is big endian it will hold a wrong value, isn't it?

Just trying to make sure I got this right, I am not going to ask more question in this direction for now.

@H2CO3: this is really not duplicate, please remove that link — , Oct 12 '13 at 17:02
why isn't it a duplicate? You asked basically the very same question. — , Oct 12 '13 at 17:06
No: there data was always stored in little endian way. Here, depending on how data is stored (e.g., little or big endian) the result will be either correct, or wrong — , Oct 12 '13 at 17:08
regardless of that, your question is the same: "is this method of storing data good regarding endianness, and if no, how do I fix it?" There isn't really need for two separate question for a little-endian and mixed-endian case. Even the answers to the two questions say the same thing. — , Oct 12 '13 at 17:10
*I am not going to ask more question in this direction for now* you are free to ask more question on the subject provided you are not asking the same questions. — ouah, Oct 12 '13 at 17:11
It's not the same really, plus I was not asking to fix anything here. Ok, let's close down this topic — , Oct 12 '13 at 17:13
@H2CO3: now because of that people started downvoting my other question :( — , Oct 12 '13 at 17:43
@dmcr_code I am sorry about that. It shouldn't be downvotes, that's not the dupe. But next time make sure not to ask a question twice. — , Oct 12 '13 at 17:44

score 3 · Answer 1 · answered Oct 12 '13 at 16:55

3

Do not do this when you have functions like htonl etc.

Takes the hassle out of things

answered Oct 12 '13 at 16:55

Ed Heal

59,252
17
87
127

`htonl`, etc. are not functions from the C library, so not all systems provide them. – ouah Oct 12 '13 at 17:31
@ouah - Lets be very pedantic - sometimes they are not even functions. – Ed Heal Oct 12 '13 at 17:35
sorry, it's not pedantic. A lot (if not most) of C code being written today is code for embedded systems and I can tell you they usually don't to provide the POSIX function implementations. – ouah Oct 12 '13 at 17:39
"A lot (if not most) of C code being written today is code for embedded systems " - Evidence for this assertion? BTW it is a #define that can be copied – Ed Heal Oct 12 '13 at 17:54

Joni · Answer 2 · 2013-10-12T17:01:00.143

1

This code does not depend on the endianness of the platform: data[0] is always stored as the most significant byte of the int, followed by the rest, and data[3] is always the least significant byte.

Whether that's "right" or "wrong" depends on how the integer has been encoded in the data array itself.

There is one problem though: if data has been declared using char rather than unsigned char, the signed data[i] will be promoted first to a signed int and you end up setting many more bits than you intended.

edited Oct 12 '13 at 17:01

answered Oct 12 '13 at 16:55

Joni

108,737
14
143
193

I think you are saying similar to what I said. But imagine someone encoded `data` in a little endian way. data[0] is LSB of some integer, then if above code is run on big endian machine, `final` will not have desired value, isn't it? – Oct 12 '13 at 16:58
If `data[0]` is the LSB of some integer the result is incorrect on both little and big endian machines. The only thing that matters is how the input has been encoded, the machine is irrelevant. – Joni Oct 12 '13 at 17:02
Yes I think you are right that is what I meant. if data[0] is LSB of some integer, actually above code won't work, because it treats data[0] as MSB of some integer, right? ok I think we are done with it – Oct 12 '13 at 17:06
1

`data[0]` has to hold the most significant byte of the value when `data` elements are stored. Also the signedness of `char` is implementation defined, so the sign extension will occur only if `char` is signed. – ouah Oct 12 '13 at 17:06

ouah · Answer 3 · 2013-10-12T17:25:49.217

This is wrong in little and big endian systems.

If data elements are of type char, you then need to cast all data elements to unsigned char before doing the bitwise left shift, otherwise you may encounter sign extension on data elements with negative values. The signedness of char is implementation defined and char can be a signed type.

Also data[0] << 24 (or even (unsigned char) data[0] << 24) will invoke undefined behavior if data[0] is a negative value as the resulting value is then not representable in an int and so you'll need an extra cast to unsigned int.

The best is to declare an unsigned char array for data and then cast each data elements to unsigned int before the left shift.

Now assuming you cast it correctly, this will work only if data[0] holds the most significant byte of your value.

score 0 · Answer 4 · answered Oct 12 '13 at 17:13

Besides the obvious problem of platform-specific byte-ordering (which other answers have addressed), you should be careful about promotion of data types.

I'm assuming that data is an array of type unsigned char. In which case, the expression

data[0] << 24

is zero; you just left shifted an 8-bit operand 24 bits! I haven't compiled it to check, or reviewed the type promotion rules, but I believe, the way you have parenthesized it, data[0] << 24 is still an unsigned char. It gets promoted when you bit-wise or the result with final. At best, it leaves too much to interpretation. A safer, more explicit way to do this is to bit-wise or first, then shift:

final |= data[0]; final <<= 8;
final |= data[1]; final <<= 8;
final |= data[2]; final <<= 8;
final |= data[3]; final <<= 8;

or you could promote explicitly and then shift:

final |= ((unsigned int)data[0]) << 24;
final |= ((unsigned int)data[1]) << 16;
final |= ((unsigned int)data[2]) << 8;
final |= ((unsigned int)data[3]);

Of course, this doesn't deal with the endianness problem at all. But that may or may not be a problem, depending on where data came from.

convert chars to integer correct way

4 Answers4