Any way to create a char of size 32 in ANSI c?

Question

I know C++11 has a type char32_t that is 4 bytes, and I'm wondering if it's possible to implement something similar in C. The program I'm writing needs to have all char arrays be a multiple of 4 bytes.

@h7r No. In C, there are very few guarantees with respect to bit width besides `char` being 8-bit, and the fixed-width types in `stdint.h` being what they are advertised as. `long` is at least 32-bit, but can be more. Also, depending on the system, things like `long` and `long long` might not fit in a single register, so certain types of operations involving 32- or 64-bit variables might have to be carried out in software, greatly slowing down the program. Good question though. http://en.wikipedia.org/wiki/C_data_types — Cloud, Feb 13 '15 at 19:22
It's C. You can push bytes around any way you like. The issue is libraries: things like printf() and strcpy() won't know what to do with 32-bit characters, so you'll have to rewrite all that yourself. — Lee Daniel Crocker, Feb 13 '15 at 19:24
@Dogbert point taken. Performance issues aside, as you say, a `long` is "at least" 32 bits, so wouldn't my idea fit? — h7r, Feb 13 '15 at 19:25
You can just use an int type, they're all numbers. If the ambiguity of the standards worry you, use a system-dependent type like UINT32 on Windows, Uint32 with SDL, etc. — rsethc, Feb 13 '15 at 20:08
@ninjalj For C99 and onward, a `char` is exactly 8 bits. On some TI DSP chips, it uses 32-bit `char`, but it's therefore non-C99 compliant. http://stackoverflow.com/questions/2215445/are-there-machines-where-sizeofchar-1 — Cloud, Feb 21 '15 at 21:58
@Dogbert: For C89, C99 and C11, a `char` is a byte, which has at least 8 bits, but may have more. So, on a DSP, a char/byte may perfectly be 32 bits, and the compiler can still be perfectly standard compliant. OTOH, POSIX requires a byte to be an octet, so 8 bits for POSIX. — ninjalj, Feb 22 '15 at 00:38
@ninjalj As per the references in the link I provided, for C99 onward, a `char` is exactly equal to 8 bits, no more. — Cloud, Feb 22 '15 at 01:51
@Dogbert: did you actually _read_ that? E.g: http://stackoverflow.com/a/2215596/371250 — ninjalj, Feb 22 '15 at 01:54
@ninjalj I should ask the same. Look at the top answer: `sizeof(char) ==1` on C99 compliant systems. http://stackoverflow.com/a/2215454/1022889 — Cloud, Feb 22 '15 at 21:07
@Dogbert: from that same answer: _It is permitted (if wasteful) for an implementation to use 32 bits to represent type char. Regardless of the implementation, the value of sizeof(char) is always 1._ 1 char = 1 byte ≠ 1 octet — ninjalj, Feb 23 '15 at 00:58
@ninjalj After further reviewing the above and the following post, you are correct. Cheers! http://stackoverflow.com/questions/437470/type-to-use-to-represent-a-byte-in-ansi-c89-90-c — Cloud, Feb 23 '15 at 14:38

score 1 · Answer 1 · answered Feb 13 '15 at 19:19

How do you plan on working with this data? Will you only be using a single byte within the 32-bit variable, or will you be storing actual 32-bit data within it?

One simple solution would be to create your own abstract data type so you can change it later:

#include <stdint.h>
typedef int32_t mChar;
mChar myChar32Array[100]; // Allocates 100x32-bit values

There is a major pitfall with tinkering with char related data types though: a lot of libraries and code snippets in assume that a char is a char when working with text. If you plan on using string manipulation functions and expect them to work across multiple systems, you always declare strings as arrays of char, and never as signed char or unsigned char. The only time you should be using unsigned char is if you are working with 8-bit binary data directly and don't want to have to deal with unexpected oddities like sign extension, which will give you funky values if you aren't careful.

I had planned on using a single byte within the 32-bit variable. So using your `mChar` I would do something like: `mchar chr = 'i';` The system I'm working with has a constraint where all strings need to be a multiple of 4 bytes, I figured declaring a type would be the best way to go about it, rather than padding all of the char arrays. — Steve Bates, Feb 13 '15 at 19:36
"`char32_t` which is an unsigned integer type used for 32-bit characters and is the same type as `uint_least32_t` C11dr §7.28 2. Recommend `uint_least32_t` instead of `int32_t` to closely mimic Op's `char32_t`. - or at least used unsigned `uint32_t`. — chux - Reinstate Monica, Feb 13 '15 at 19:49

Antti Haapala -- Слава Україні · Answer 2 · 2015-02-13T19:56:03.583

0

C11 standard does support char32_t with strings encoded in UTF-32, for example:

#include <uchar.h>

int main() {
    char32_t *str = U"Hello world";
}

The program compiles cleanly with say gcc -std=c11.

edited Feb 13 '15 at 19:56

answered Feb 13 '15 at 19:50

Antti Haapala -- Слава Україні

129,958
22
279
321

I'm using an earlier C compiler from 2009, so this won't work for me. Thank you for your input though. – Steve Bates Feb 13 '15 at 20:22
which compiler is this? – Antti Haapala -- Слава Україні Feb 13 '15 at 20:23
It's a proprietary one I'm using at work...a lot of the standard libraries aren't allowed (stdint.h and uchar.h aren't available). – Steve Bates Feb 13 '15 at 20:34

Any way to create a char of size 32 in ANSI c?

2 Answers2