0

I know C++11 has a type char32_t that is 4 bytes, and I'm wondering if it's possible to implement something similar in C. The program I'm writing needs to have all char arrays be a multiple of 4 bytes.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
  • longs are 32 bits in C. Can't this be used? – h7r Feb 13 '15 at 19:18
  • @h7r No. In C, there are very few guarantees with respect to bit width besides `char` being 8-bit, and the fixed-width types in `stdint.h` being what they are advertised as. `long` is at least 32-bit, but can be more. Also, depending on the system, things like `long` and `long long` might not fit in a single register, so certain types of operations involving 32- or 64-bit variables might have to be carried out in software, greatly slowing down the program. Good question though. http://en.wikipedia.org/wiki/C_data_types – Cloud Feb 13 '15 at 19:22
  • It's C. You can push bytes around any way you like. The issue is libraries: things like printf() and strcpy() won't know what to do with 32-bit characters, so you'll have to rewrite all that yourself. – Lee Daniel Crocker Feb 13 '15 at 19:24
  • @Dogbert point taken. Performance issues aside, as you say, a `long` is "at least" 32 bits, so wouldn't my idea fit? – h7r Feb 13 '15 at 19:25
  • 1
    why don't use `int_least32_t`? – phuclv Feb 13 '15 at 19:45
  • You can just use an int type, they're all numbers. If the ambiguity of the standards worry you, use a system-dependent type like UINT32 on Windows, Uint32 with SDL, etc. – rsethc Feb 13 '15 at 20:08
  • @Dogbert: `char` is _at least_ 8 bits, but can be more. – ninjalj Feb 21 '15 at 20:21
  • @ninjalj For C99 and onward, a `char` is exactly 8 bits. On some TI DSP chips, it uses 32-bit `char`, but it's therefore non-C99 compliant. http://stackoverflow.com/questions/2215445/are-there-machines-where-sizeofchar-1 – Cloud Feb 21 '15 at 21:58
  • @Dogbert: For C89, C99 and C11, a `char` is a byte, which has at least 8 bits, but may have more. So, on a DSP, a char/byte may perfectly be 32 bits, and the compiler can still be perfectly standard compliant. OTOH, POSIX requires a byte to be an octet, so 8 bits for POSIX. – ninjalj Feb 22 '15 at 00:38
  • @ninjalj As per the references in the link I provided, for C99 onward, a `char` is exactly equal to 8 bits, no more. – Cloud Feb 22 '15 at 01:51
  • @Dogbert: did you actually _read_ that? E.g: http://stackoverflow.com/a/2215596/371250 – ninjalj Feb 22 '15 at 01:54
  • @ninjalj I should ask the same. Look at the top answer: `sizeof(char) ==1` on C99 compliant systems. http://stackoverflow.com/a/2215454/1022889 – Cloud Feb 22 '15 at 21:07
  • @Dogbert: from that same answer: _It is permitted (if wasteful) for an implementation to use 32 bits to represent type char. Regardless of the implementation, the value of sizeof(char) is always 1._ 1 char = 1 byte ≠ 1 octet – ninjalj Feb 23 '15 at 00:58
  • @ninjalj After further reviewing the above and the following post, you are correct. Cheers! http://stackoverflow.com/questions/437470/type-to-use-to-represent-a-byte-in-ansi-c89-90-c – Cloud Feb 23 '15 at 14:38

2 Answers2

1

How do you plan on working with this data? Will you only be using a single byte within the 32-bit variable, or will you be storing actual 32-bit data within it?

One simple solution would be to create your own abstract data type so you can change it later:

#include <stdint.h>
typedef int32_t mChar;
mChar myChar32Array[100]; // Allocates 100x32-bit values

There is a major pitfall with tinkering with char related data types though: a lot of libraries and code snippets in assume that a char is a char when working with text. If you plan on using string manipulation functions and expect them to work across multiple systems, you always declare strings as arrays of char, and never as signed char or unsigned char. The only time you should be using unsigned char is if you are working with 8-bit binary data directly and don't want to have to deal with unexpected oddities like sign extension, which will give you funky values if you aren't careful.

Cloud
  • 18,753
  • 15
  • 79
  • 153
  • I had planned on using a single byte within the 32-bit variable. So using your `mChar` I would do something like: `mchar chr = 'i';` The system I'm working with has a constraint where all strings need to be a multiple of 4 bytes, I figured declaring a type would be the best way to go about it, rather than padding all of the char arrays. – Steve Bates Feb 13 '15 at 19:36
  • 1
    "`char32_t` which is an unsigned integer type used for 32-bit characters and is the same type as `uint_least32_t` C11dr §7.28 2. Recommend `uint_least32_t` instead of `int32_t` to closely mimic Op's `char32_t`. - or at least used unsigned `uint32_t`. – chux - Reinstate Monica Feb 13 '15 at 19:49
0

C11 standard does support char32_t with strings encoded in UTF-32, for example:

#include <uchar.h>

int main() {
    char32_t *str = U"Hello world";
}

The program compiles cleanly with say gcc -std=c11.