2

I have searched in many sites and did not get anything. I know that char8_t is a keyword in C++ since C++20. I am trying to find out in C, are they typedef-ing unsigned char to char8_t in C23 (with the release of u8 character literals). Can anyone clarify me.

Sourav Kannantha B
  • 2,860
  • 1
  • 11
  • 35
  • I don't think so. You could do `typedef uint8_t char8_t` to achieve a similar effect (though I wouldn't recommend making your own `_t` types due to POSIX reserving them. – mediocrevegetable1 Mar 29 '21 at 05:40
  • 1
    [It was proposed for C2X](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2231.htm), but the new standard is not yet complete. – paulotorrens Mar 29 '21 at 05:41

2 Answers2

4

Support for a char8_t type (or typedef) has not yet been added to C23. I submitted N2231 to add it back in 2018, but was not able to attend any meetings to argue for it then.

EDIT: 2021-06-06: N2653 (char8_t: A type for UTF-8 characters and strings (Revision 1)) has been submitted for C2x. Implementations of that proposal are available for gcc here and for glibc here. Patches submitted to gcc and glibc can be found here and here respectively.

EDIT: 2022-07-06: Implementations of the library portions of N2653 have finally landed in glibc for the 2.36 release expected in August.

EDIT: 2022-12-30: N2653 (char8_t: A type for UTF-8 characters and strings (Revision 1)) was accepted for C23 during the January/February 2022 WG14 virtual meeting and wording is present as indicated in the N3054 C working draft. Compiler support is present in the in-development branch of gcc that will be released as gcc 13 in April or May. Compiler support is also present in the in-development branch of Clang that will be released as Clang 16 in March or April.

N2653 doesn't just propose the char8_t typedef. It also proposes that the type of u8 string literals be changed to char8_t/unsigned char to match u8 character constants, that the mbrtoc8() and c8rtomb() functions from C++20 be added, and that some atomic related macros and types be added.

Tom Honermann
  • 1,774
  • 1
  • 7
  • 10
1

There is no such things but you can have it with a typedef, as char8_t is equivalent to unsigned char:

typedef unsigned char char8_t;

You now have a char8_t.

NB: I originally answered offering to use a macro (define), but for data types, typedef is the way

Antonin GAVREL
  • 9,682
  • 8
  • 54
  • 81
  • 8
    A `typedef` would be better. – Keith Thompson Mar 29 '21 at 05:41
  • 3
    @RobertHarvey Consistency with other type names (as in `` et al), and a general preference for using core language features rather than the preprocessor. – Keith Thompson Mar 29 '21 at 05:55
  • 1
    After further reading I totally agree, sorry for the blunder. – Antonin GAVREL Mar 29 '21 at 05:59
  • Did you consider other platforms except for Linux? – prehistoricpenguin Mar 29 '21 at 06:04
  • What is wrong with other platforms? It's only some other languages (like Java) that think that char is more than 1 byte (from what I know). – Antonin GAVREL Mar 29 '21 at 06:09
  • 1
    I think @prehistoricpenguin means whether a `char` is an octet or not (`sizeof (char)` should always give 1 though because it checks byte-wise, but not necessarily one octet). Personally I've never seen any platform where a `char` isn't an octet but I don't know. For Linux though, POSIX guarantees `CHAR_BIT` to be 8. – mediocrevegetable1 Mar 29 '21 at 06:18
  • Yes I understood the same way, and I have also never seen any platform where `CHAR_BIT` is not 8, would be happy to be proved wrong – Antonin GAVREL Mar 29 '21 at 06:48
  • @AntoninGAVREL Old mainframes and DSP – Acorn Mar 30 '21 at 00:20
  • @mediocrevegetable1 [`sizeof( [[un]signed] char )` is always one](https://port70.net/~nsz/c/c11/n1570.html#6.5.3.4p4): "When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1." – Andrew Henle Mar 30 '21 at 00:24
  • Note that `char8_t` is [POSIX reserved](https://stackoverflow.com/a/231807/2410359), but OK in standard C. – chux - Reinstate Monica Mar 30 '21 at 02:09
  • @AndrewHenle I am aware of this, and I explained this in my comment. I actually got this knowledge from [this helpful question](https://stackoverflow.com/questions/2215445/are-there-machines-where-sizeofchar-1-or-at-least-char-bit-8) and note that in the top answer, there is this quote from Harbison and Steele: "It is permitted (if wasteful) for an implementation to use 32 bits to represent type char. Regardless of the implementation, the value of `sizeof(char)` is always 1." – mediocrevegetable1 Mar 30 '21 at 02:51
  • Looks like there will be a `char8_t`, as per some committee meeting Jan/Feb 2022. See the current working draft https://open-std.org/JTC1/SC22/WG14/www/docs/n3054.pdf. Might want to delete this answer since it's outdated. – Lundin Oct 07 '22 at 10:39