Bitmask for exactly one byte in C

Question

My goal is to save a long in four bytes like this:

unsigned char bytes[4];
unsigned long n = 123;

bytes[0] = (n >> 24) & 0xFF;
bytes[1] = (n >> 16) & 0xFF;
bytes[2] = (n >> 8) & 0xFF;
bytes[3] = n & 0xFF;

But I want the code to be portable, so I use CHAR_BIT from <limits.h>:

unsigned char bytes[4];
unsigned long n = 123;

bytes[0] = (n >> (CHAR_BIT * 3)) & 0xFF;
bytes[1] = (n >> (CHAR_BIT * 2)) & 0xFF;
bytes[2] = (n >> CHAR_BIT) & 0xFF;
bytes[3] = n & 0xFF;

The problem is that the bitmask 0xFF only accounts for eight bits, which is not necessarily equivalent to one byte. Is there a way to make the upper code completely portable for all platforms?

I think I'm missing something, but why is the first not portable? — efox29, Apr 29 '21 at 19:40
@efox29 A [`char`](https://en.wikipedia.org/wiki/C_data_types#Basic_types) is defined to have `CHAR_BIT` bits in C, which is not always `8`. — Andy Sukowski-Bang, Apr 29 '21 at 19:42
@efox29 there are some architectures where byte is larger than 8 bits. — tstanisl, Apr 29 '21 at 19:42
first thought is an extra step to create an "0xFF bit mask" thats `CHAR_BIT` size... `unsigned long bitmask = 0;`, then loop `CHAR_BIT` times, ORing in a 1 at each incrementing position, then `.. & bitmask;` instead of `.. & 0xFF;` — yano, Apr 29 '21 at 19:44
moreover, there is no guarantee that `long` consists of 4 bytes. So the task you try to solve is not portable :) — tstanisl, Apr 29 '21 at 19:45
@tstanisl I know, but 4 bytes are guaranteed to be able to hold the minimum size of a long. — Andy Sukowski-Bang, Apr 29 '21 at 19:47
interesting...but does it even matter? You are storing your values in an unsigned char (which may or may not be 8 bits). Why not use stdint.h to get uint8_t which specifies size ? — efox29, Apr 29 '21 at 19:48
@AndySukowski-Bang, so please explain why my compiler says that `sizeof(long)` is 8? — tstanisl, Apr 29 '21 at 19:50
@tstanisl I said that they are guaranteed to be able to hold the _minimum size_ of a long. But you are right, it's not completely portable. But the numbers stored are not actually that big. — Andy Sukowski-Bang, Apr 29 '21 at 19:52
So maybe the real problem is how to portable encode a number from range 0..2^32-1 as "bytes"? — tstanisl, Apr 29 '21 at 19:57
The best solution to the problem is `_Static_assert(CHAR_BITS==8, "Don't import C libraries into your horrible exotic DSP project. And consider getting a better CPU.");` — Lundin, Apr 30 '21 at 08:54
@Lundin, unfortunately, it will work only on C11 compiler which is not likely to be available for "horrible exotic DSP" machine. Anyway, I support the idea — tstanisl, Apr 30 '21 at 19:45

Zoso · Accepted Answer · 2021-04-29T20:28:19.373

How about something like:

unsigned long mask = 1;
mask<<=CHAR_BIT;
mask-=1;

and then using this as the mask instead of 0xFF?

Test program:

#include <stdio.h>

int main() {
    #define MY_CHAR_BIT_8 8
    #define MY_CHAR_BIT_9 9
    #define MY_CHAR_BIT_10 10
    #define MY_CHAR_BIT_11 11
    #define MY_CHAR_BIT_12 12
    {
        unsigned long mask = 1;
        mask<<=MY_CHAR_BIT_8;
        mask-= 1;
        printf("%lx\n", mask);
    }
    {
        unsigned long mask = 1;
        mask<<=MY_CHAR_BIT_9;
        mask-= 1;
        printf("%lx\n", mask);
    }
    {
        unsigned long mask = 1;
        mask<<=MY_CHAR_BIT_10;
        mask-= 1;
        printf("%lx\n", mask);
    }
    {
        unsigned long mask = 1;
        mask<<=MY_CHAR_BIT_11;
        mask-= 1;
        printf("%lx\n", mask);
    }
    {
        unsigned long mask = 1;
        mask<<=MY_CHAR_BIT_12;
        mask-= 1;
        printf("%lx\n", mask);
    }
}

Output:

ff
1ff
3ff
7ff
fff

I suggest using `unsigned long`, not `uint32_t` as the type of mask should be same as the serialized type. — tstanisl, Apr 29 '21 at 20:14

score 1 · Answer 2 · answered Apr 30 '21 at 09:25

I work almost exclusively with embedded systems where I rather often have to provide portable code between all manner of more or less exotic systems. Like writing code which will work both on some tiny 8 bit MCU and a x86_64.

But even for me, bothering with portability to exotic obsolete DSP systems and the like is a huge waste of time. These systems barely exist in the real world - why exactly do you need portability to them? Is there any other reason than "showing off" mostly useless language lawyer knowledge of C? In my experience, 99% of all such useless portability concerns boil down to programmers "showing off", rather than an actual requirement specification.

And even if you for some strange reason do need such portability, this task doesn't make any sense to begin with since neither char nor long are portable! If char is not 8 bits then what makes you think long is 4 bytes? It could be 2 bytes, it could be 8 bytes, or it could be something else.

If portability is an actual concern, then you must use stdint.h. Then if you truly must support exotic systems, you have to decide which ones. The only real-world computers I know of that actually do use different byte sizes are various obsolete exotic TI DSPs from the 1990s, which use 16 bit bytes/char. Lets assume this is your intended target which you have decided is important to support.

Lets also assume that a standard C compiler (ISO 9899) exists for that exotic target, which is highly unlikely. (More likely you'll get a poorly conforming, mostly broken legacy C90 thing... or even more likely those who use the target write everything in assembler.) In case of a standard C compiler, it will not implement uint8_t since it's not a mandatory type if the target doesn't support it. Only uint_least8_t and uint_fast8_t are mandatory.

Then you'd go about it like this:

#include <stdint.h>
#include <limits.h>
#if CHAR_BIT == 8
static void uint32_to_uint8 (uint8_t dst[4], uint32_t u32)
{
  dst[0] = (u32 >> 24) & 0xFF;
  dst[1] = (u32 >> 16) & 0xFF;
  dst[2] = (u32 >>  8) & 0xFF;
  dst[3] = (u32 >>  0) & 0xFF;
}
#endif 

// whatever other conversion functions you need:
static void uint32_to_uint16 (uint16_t dst[2], uint32_t u32){ ... }
static void uint64_to_uint16 (uint16_t dst[2], uint32_t u32){ ... }

The exotic DSP will then use the uint32_to_uint16 function. You could use the same compiler #if CHAR_BIT checks to do #define byte_to_word uint32_to_uint16 etc.

And then should also immediately notice that endianess will be the next major portability concern. I have no idea what endianess obsolete DSPs often use, but that's another question.

You are probably right. I should start focusing on more fun aspects of programming than bother with portability for each platform. Thanks for your answer! — Andy Sukowski-Bang, Apr 30 '21 at 10:00

score 0 · Answer 3 · answered Apr 29 '21 at 20:01

0

What about:

unsigned long mask = (unsigned char)-1;

This will work because the C standard says in 6.3.1.3p2

1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

And that unsigned long can represent all values of unsigned char.

answered Apr 29 '21 at 20:01

tstanisl

13,520
2
25
40

Not portable as it assumes two`s complement. He asked for portable way – 0___________ Apr 29 '21 at 21:25
@0___________, isn't 2-complement used for encoding **signed** integers? – tstanisl Apr 29 '21 at 21:31
@0___________, cast is *not* reinterpretation, it will set `mask` to "one more than the maximum value" what is 2^CHAR_BIT incremented by `-1`. It is actually the most portable way of setting all value bits in unsigned integer. – tstanisl Apr 29 '21 at 21:38
on some system yes on another not. **Portable** means that it will work on **any** system. – 0___________ Apr 29 '21 at 21:38
`~0` is the most portable way – 0___________ Apr 29 '21 at 21:39
@0___________, no. `~0` will not work because 0 is `int`. It will fail on non-two-complement machines. On two-complement machines it will be `-1` and will set all bits to 1. – tstanisl Apr 29 '21 at 21:42
@0___________, please read https://stackoverflow.com/questions/809227/is-it-safe-to-use-1-to-set-all-bits-to-true – tstanisl Apr 29 '21 at 21:44
The signed/unsigned integer conversion rules are based on the decimal/mathematical representation of the value, not the underlying binary representation. `~0` will result in the binary representation of "all ones", no matter what decimal value that happens to correspond to. If the purpose is to create a bit mask then that's obviously the correct method. -1 on a one's complement system is 0xFF...FE and on a signed magnitude system it is 0x80...1. Neither which can be used as a "all ones" bit mask. – Lundin Apr 30 '21 at 09:36
@Lundin, as I said casting to (unsigned char) is not reinterpretation of bits. Thus -1 will become 255 no matter how -1 was originally represented – tstanisl Apr 30 '21 at 10:04
@Lundin, it is actually the `~0` that is *not portable*. It will indeed consist of all `1` bits but after casting to unsigned type the resulting value may not consist of only `1` bits. Please follow the link in the other comment. – tstanisl Apr 30 '21 at 10:25
Yes we shouldn't used signed type integer constants when doing bitwise arithmetic. The correct code is `~0u`. – Lundin Apr 30 '21 at 10:27
@Lundin, `~0u` is good as long it is cast to type not wider that `unsigned`. Cast to `unsiged char` will be correct. I just say that cast from `-1` will work for all widths and it is independent of integer encoding. That is why it is the most *portable* though a bit confusing on the first look. – tstanisl Apr 30 '21 at 10:31
@Lundin Using negative numbers with `unsigned` is perfectly defined by standard, and there will be no UB due to overflows. – tstanisl Apr 30 '21 at 10:33
Anyway this whole discussion is nonsense... if portability to highly exotic systems is actually a requirement, then one needs to specify which ones and how they work. One's complement is extremely rare and I've never even heard of a signed magnitude computer. Also there's rumours about banning everything that isn't two's complement from C, which would clean up the language quite a bit. `stdint.h` already solves this since it requires two's complement format. – Lundin Apr 30 '21 at 10:43
@Lundin, I agree that this whole issue of portability of int encoding is a nonsense. All sane platforms use two-complement. But this answer still correctly addresses OP's question and it should not be downvoted. – tstanisl Apr 30 '21 at 10:46
You only answer the bit masking part though, not how to divide a long into bytes portably. – Lundin Apr 30 '21 at 11:00
@Lundin, the second part is impossible to answer because `long` is itself not portable. Other answers even assume that it's size is 4. – tstanisl Apr 30 '21 at 11:12

0___________ · Answer 4 · 2021-04-29T21:37:52.767

0

#define CHARMASK ((1UL << CHAR_BIT) - 1)

int main(void)
{
    printf("0x%x\n", CHARMASK);
}

And the mask will always have width of the char. Calculated compile time, no additional variables needed.

Or

#define CHARMASK    ((unsigned char)(~0))

You can do it without the masks as well

void foo(unsigned int n, unsigned char *bytes)
{
    bytes[0] = ((n << (CHAR_BIT * 0)) >> (CHAR_BIT * 3));
    bytes[1] = ((n << (CHAR_BIT * 1)) >> (CHAR_BIT * 3));
    bytes[2] = ((n << (CHAR_BIT * 2)) >> (CHAR_BIT * 3));
    bytes[3] = ((n << (CHAR_BIT * 3)) >> (CHAR_BIT * 3));
}


int main(void)
{
    unsigned int z = 0xaabbccdd;
    unsigned char bytes[4];
    foo(z, bytes);
    printf("0x%02x 0x%02x 0x%02x 0x%02x\n", bytes[0], bytes[1], bytes[2], bytes[3]);
}

edited Apr 29 '21 at 21:37

answered Apr 29 '21 at 21:22

0___________

60,014
4
34
74

`((unsigned char)(~0))` is not portable. On machines with "one complement" representation of int `~0` will be `-0`! Casting to unsigned type will set the mask to 0 what is very wrong. Please read https://en.wikipedia.org/wiki/Ones%27_complement – tstanisl Apr 29 '21 at 22:20
More importantly, the assumption that long/int will be so kind to remain a 4x8 bit integer while char is something else isn't correct. – Lundin Apr 30 '21 at 09:30

Bitmask for exactly one byte in C

4 Answers4