shifting an unsigned char by more than 8 bits

Question

I'm a bit troubled by this code:

typedef struct _slink{
    struct _slink* next;
    char type;
    void* data;
}

assuming what this describes is a link in a file, where data is 4bytes long representing either an address or an integer(depending on the type of the link)

Now I'm looking at reformatting numbers in the file from little-endian to big-endian, and so what I wanna do is change the order of the bytes before writing back to the file, i.e. for 0x01020304, I wanna convert it to 0x04030201 so when I write it back, its little endian representation is gonna look like the big endian representation of 0x01020304, I do that by multiplying the i'th byte by 2^8*(3-i), where i is between 0 and 3. Now this is one way it was implemented, and what troubles me here is that this is shifting bytes by more than 8 bits.. (L is of type _slink*)

int data = ((unsigned char*)&L->data)[0]<<24) + ((unsigned char*)&L->data)[1]<<16) + 
                    ((unsigned char*)&L->data)[2]<<8) + ((unsigned char*)&L->data)[3]<<0)

Can anyone please explain why this actually works? without having explicitly cast these bytes to integers to begin with(since they're only 1 bytes but are shifted by up to 24 bits) Thanks in advance.

Actually, the code has a bug. `(unsigned char*)&L->data)[0]` is an `unsigned char`. With an 8-bit `unsigned char` and a 32-bit `int`, it will be automatically promoted to `int`. Then `<<24` shifts the value 24 bits. If the high bit (7) of the `unsigned char` is set, it is shifted into the high bit (31) of the `int`. This overflows the signed `int`, and then the behavior is not defined by the C standard. To fix this, `(unsigned char*)&L->data)[0]` should be cast to `unsigned int` before `<<24`. — Eric Postpischil, Jun 30 '20 at 21:41
Could you just use the functions from the `htonl` family? https://linux.die.net/man/3/htonl — bruceg, Jun 30 '20 at 21:44
Does this answer your question? [Does bit shift automatically promote chars to int?](https://stackoverflow.com/questions/18370197/does-bit-shift-automatically-promote-chars-to-int) — phuclv, Jul 01 '20 at 02:41

score 4 · Accepted Answer · answered Jun 30 '20 at 21:22

4

Any integer type smaller than int is promoted to type int when used in an expression.

So the shift is actually applied to an expression of type int instead of type char.

answered Jun 30 '20 at 21:22

dbush

205,898
23
218
273

chux - Reinstate Monica · Answer 2 · 2020-07-01T02:21:46.560

Can anyone please explain why this actually works?

The shift does not occur as an unsigned char but as a type promoted to an int¹. @dbush.

Reasons why code still has issues.

32-bit int

Shifting a int 1 into the the sign's place is undefined behavior UB. See also @Eric Postpischil.

((unsigned char*)&L->data)[0]<<24)  // UB

16-bit int

Shifting by the bit width or more is insufficient precision even if the type was unsigned. As int it is UB like above. Perhaps then OP would have only wanted a 2-byte endian swap?

Alternative

const uint8_t *p = &L->data;
uint32_t data = (uint32_t)p[0] << 24 | (uint32_t)p[1] << 16 | //
    (uint32_t)p[2] << 8 | (uint32_t)p[3] << 0;

For the pedantic

Had int used non-2's complement, the addition of a negative value from ((unsigned char*)&L->data)[0]<<24) would have messed up the data pattern. Endian manipulations are best done using unsigned types.

from little-endian to big-endian

This code does not swap between those 2 endians. It is a big endian to native endian swap. When this code is run on a 32-bit unsigned little endian machine, it is effectively a big/little swap. On a 32-bit unsigned big endian machine, it could have been a no-op.

¹ ... or posibly an unsigned on select platforms where UCHAR_MAX > INT_MAX.

shifting an unsigned char by more than 8 bits

2 Answers2

Linked