I know C++11 has a type char32_t that is 4 bytes, and I'm wondering if it's possible to implement something similar in C. The program I'm writing needs to have all char arrays be a multiple of 4 bytes.
-
longs are 32 bits in C. Can't this be used? – h7r Feb 13 '15 at 19:18
-
@h7r No. In C, there are very few guarantees with respect to bit width besides `char` being 8-bit, and the fixed-width types in `stdint.h` being what they are advertised as. `long` is at least 32-bit, but can be more. Also, depending on the system, things like `long` and `long long` might not fit in a single register, so certain types of operations involving 32- or 64-bit variables might have to be carried out in software, greatly slowing down the program. Good question though. http://en.wikipedia.org/wiki/C_data_types – Cloud Feb 13 '15 at 19:22
-
It's C. You can push bytes around any way you like. The issue is libraries: things like printf() and strcpy() won't know what to do with 32-bit characters, so you'll have to rewrite all that yourself. – Lee Daniel Crocker Feb 13 '15 at 19:24
-
@Dogbert point taken. Performance issues aside, as you say, a `long` is "at least" 32 bits, so wouldn't my idea fit? – h7r Feb 13 '15 at 19:25
-
1why don't use `int_least32_t`? – phuclv Feb 13 '15 at 19:45
-
You can just use an int type, they're all numbers. If the ambiguity of the standards worry you, use a system-dependent type like UINT32 on Windows, Uint32 with SDL, etc. – rsethc Feb 13 '15 at 20:08
-
@Dogbert: `char` is _at least_ 8 bits, but can be more. – ninjalj Feb 21 '15 at 20:21
-
@ninjalj For C99 and onward, a `char` is exactly 8 bits. On some TI DSP chips, it uses 32-bit `char`, but it's therefore non-C99 compliant. http://stackoverflow.com/questions/2215445/are-there-machines-where-sizeofchar-1 – Cloud Feb 21 '15 at 21:58
-
@Dogbert: For C89, C99 and C11, a `char` is a byte, which has at least 8 bits, but may have more. So, on a DSP, a char/byte may perfectly be 32 bits, and the compiler can still be perfectly standard compliant. OTOH, POSIX requires a byte to be an octet, so 8 bits for POSIX. – ninjalj Feb 22 '15 at 00:38
-
@ninjalj As per the references in the link I provided, for C99 onward, a `char` is exactly equal to 8 bits, no more. – Cloud Feb 22 '15 at 01:51
-
@Dogbert: did you actually _read_ that? E.g: http://stackoverflow.com/a/2215596/371250 – ninjalj Feb 22 '15 at 01:54
-
@ninjalj I should ask the same. Look at the top answer: `sizeof(char) ==1` on C99 compliant systems. http://stackoverflow.com/a/2215454/1022889 – Cloud Feb 22 '15 at 21:07
-
@Dogbert: from that same answer: _It is permitted (if wasteful) for an implementation to use 32 bits to represent type char. Regardless of the implementation, the value of sizeof(char) is always 1._ 1 char = 1 byte ≠ 1 octet – ninjalj Feb 23 '15 at 00:58
-
@ninjalj After further reviewing the above and the following post, you are correct. Cheers! http://stackoverflow.com/questions/437470/type-to-use-to-represent-a-byte-in-ansi-c89-90-c – Cloud Feb 23 '15 at 14:38
2 Answers
How do you plan on working with this data? Will you only be using a single byte within the 32-bit variable, or will you be storing actual 32-bit data within it?
One simple solution would be to create your own abstract data type so you can change it later:
#include <stdint.h>
typedef int32_t mChar;
mChar myChar32Array[100]; // Allocates 100x32-bit values
There is a major pitfall with tinkering with char
related data types though: a lot of libraries and code snippets in assume that a char
is a char
when working with text. If you plan on using string manipulation functions and expect them to work across multiple systems, you always declare strings as arrays of char
, and never as signed char
or unsigned char
. The only time you should be using unsigned char
is if you are working with 8-bit binary data directly and don't want to have to deal with unexpected oddities like sign extension, which will give you funky values if you aren't careful.

- 18,753
- 15
- 79
- 153
-
I had planned on using a single byte within the 32-bit variable. So using your `mChar` I would do something like: `mchar chr = 'i';` The system I'm working with has a constraint where all strings need to be a multiple of 4 bytes, I figured declaring a type would be the best way to go about it, rather than padding all of the char arrays. – Steve Bates Feb 13 '15 at 19:36
-
1"`char32_t` which is an unsigned integer type used for 32-bit characters and is the same type as `uint_least32_t` C11dr §7.28 2. Recommend `uint_least32_t` instead of `int32_t` to closely mimic Op's `char32_t`. - or at least used unsigned `uint32_t`. – chux - Reinstate Monica Feb 13 '15 at 19:49
C11 standard does support char32_t
with strings encoded in UTF-32, for example:
#include <uchar.h>
int main() {
char32_t *str = U"Hello world";
}
The program compiles cleanly with say gcc -std=c11
.

- 129,958
- 22
- 279
- 321
-
I'm using an earlier C compiler from 2009, so this won't work for me. Thank you for your input though. – Steve Bates Feb 13 '15 at 20:22
-
-
It's a proprietary one I'm using at work...a lot of the standard libraries aren't allowed (stdint.h and uchar.h aren't available). – Steve Bates Feb 13 '15 at 20:34