Interpret int8 as two int4

Question

I have a pointer of (signed) int8_t

int8_t *data

It comes from a netcdf file, in which data is encoded as a int8 array. To limit the file size and use the cheapest representation as possible, it is in reality a successions of signed integers of different sizes (4,8,16 and 32 bits), whose organization is specified elsewhere and not relevant to that question.

If I want to interpret the data as int16 or int32 I can just do (updated after StoryTeller comment on undefined behaviour of the use of a reinterpret_cast in this case):

int16_t data_16
memcpy(&data_16, &data[index], sizeof(int16_t));
int data_16_32 = data_16

However, if the data is to be interpreted as int4 (two int4 in a single int8 memory space), how can I retrieve the int4 values? The int4 type does not exist in C.

My question in short: Ho to interpret a int8 variable as two int4

I guess this topic could be useful, but i do not really understand: https://codereview.stackexchange.com/questions/30593/split-a-long-integer-into-eight-4-bit-values

4 bits is less than a byte on most machines, so you have to do bit shift yourself. — YiFei, Jul 03 '17 at 12:59
*"If I want to interpret the data as int16, I can just do:"* - Actually you can't just do that. This sort of type punning is undefined behavior. — StoryTeller - Unslander Monica, Jul 03 '17 at 12:59
You must first get the single *byte*, then use bitwise masking and possible shifting to get the nibble. — Some programmer dude, Jul 03 '17 at 12:59
@tilz0R - You can use `int8_t*` (assuming it's an alias for a char type) to pun something that is **originally** `int16_t`. If you *start* with `int8_t[2]`, the converse is not defined. — StoryTeller - Unslander Monica, Jul 03 '17 at 13:03
@tilz0R I'll try to find a standard reference for that in a moment. — HolyBlackCat, Jul 03 '17 at 13:06
@tilz0R Uh. The standard says in some places that if you do such and such, the behaviour is undefined. — HolyBlackCat, Jul 03 '17 at 13:07
@tilz0R Think about it: this is UB even if only because of endianness... — Boiethios, Jul 03 '17 at 13:10
@tilz0R The C and C++ standards both explicitly say that the behavior for the code above is undefined. Look up the strict aliasing rule. — interjay, Jul 03 '17 at 13:14
@tilz0R Here is your reference: https://pastebin.com/81TArZAh — HolyBlackCat, Jul 03 '17 at 13:15
@tilz0R Also worth reading: https://stackoverflow.com/a/7005988/2752075 — HolyBlackCat, Jul 03 '17 at 13:17
If the casting is not safe, what can I do to go from to int8 to a int16? I was thinking of retrieving the 2 int8 values, convert them to binary, concatenate the binaries, then convert back to signed int16. Would this even work? It is a bit heavy. — seb007, Jul 03 '17 at 13:24

Graeme · Accepted Answer · 2017-07-04T10:12:14.903

3

Mask off the two 4bit integers, if they are signed you need to be careful, unsigned not so much.

For unsigned integers you can simply do:

uint8_t original = 0xab;  // Source 8 bit data
uint8_t low = original & 0x0f // Mask off the high bits and leave just the low bits
uint8_t high = original >>4; // Move the high bits to the low bits

If you have 4 bit signed integers you will need to shift them all the way left BEFORE moving them right again to maintain the sign status

int8_t original = 0xef; // Source 8 bit data -2 (high)and -1 (low)
int8_t low = original << 4;  // Move left 4 bits first
low = low >> 4;  // Move bits right but maintain sign
int8_t high = original >> 4; // Move right 4 bits

Hopefully this helps

edited Jul 04 '17 at 10:12

answered Jul 03 '17 at 13:12

Graeme

1,643
15
27

Nice answer. I'd suggest two changes: use `int8_t` for signed and use `uint8_t` for unsigned. `int8 high = original >>4;` would fail for unsigned if `original` is signed. – Christoph Diegelmann Jul 03 '17 at 13:13
1

Approved your edit @Christoph thanks. I guess the use of uint8_t and int8_t would affect >> being logical or arithmetic too (had read it is was usually arithmetic but implementation dependant). – Graeme Jul 03 '17 at 14:13
When I try your code (unsigned version) and print the results: `int8_t original = 0xef; int8_t low = (original << 4) >> 4; int8_t high = original >> 4; printf("original %" PRIi8 " low %" PRIi8 " high %" PRIi8 "\n", original, low, high);`, I get wrong results for the high integer: `original -17 low -17 high -2` – seb007 Jul 04 '17 at 08:02
1

It looks like a compiler optimization or something (Intel compiler, windows). If I split the command in two lines it works: `int8_t original = 0xef; int8_t low = original << 4; low = low>> 4; int8_t high = original >> 4; printf("original %" PRIi8 " low %" PRIi8 " high %" PRIi8 "\n", original, low, high); exit(-1);` As a result I get: `original -17 low -1 high -2`. – seb007 Jul 04 '17 at 08:12
Thanks @seb007 I have amended my answer to take that into account. – Graeme Jul 04 '17 at 10:12

Jesper Juhl · Answer 2 · 2017-07-03T13:16:40.597

2

One way to do it would be to use shifts and masks, like so:

unsigned high4bits = int8 >> 4; // shift high 4 bits into lower 4
unsigned low4bits = int8 & 0b00001111; // keep only low 4 bits

assuming int8 holds the initial two 4 bit ints and is of type uint8_t and you want 2 unsigned 4bit ints.

edited Jul 03 '17 at 13:16

answered Jul 03 '17 at 13:02

Jesper Juhl

30,449
3
47
70

this lacks sign extending for low4bits. – Christoph Diegelmann Jul 03 '17 at 13:03
@tilz0R int4_t would be signed so no `((uint8_t)int8) >> 4` would drop exactly the sign extending that is needed. What we need is sign extending of `int8 & 0b00001111`. – Christoph Diegelmann Jul 03 '17 at 13:07
1

@Christoph sign extending is not guaranteed, it's implementation defined. And we don't know if the results needs to be signed. – interjay Jul 03 '17 at 13:11
I updated the question: results need to be signed indeed. – seb007 Jul 03 '17 at 13:13
You should make it clear that `int8` must be unsigned or cast to unsigned before the shift ! – Christoph Diegelmann Jul 03 '17 at 13:19
@Christoph Isn't that sufficiently clear by "assuming int8 holds the initial two 4 bit ints and is of type uint8_t" ? – Jesper Juhl Jul 03 '17 at 14:40

Interpret int8 as two int4

2 Answers2