0

I know that letters/one-digit-numbers can be stored in chars,

char letter = 'c';
char number = '1';

but can emojis or forgain letters be stored in a char? If not, how can I store them? Is this possible without strings?

2boltz
  • 1
  • 1
  • You would store it in a string with utf8 encoding. – Aykhan Hagverdili May 18 '22 at 05:29
  • See "[Confusing sizeof(char) by ISO/IEC in different character set encoding like UTF-16](//stackoverflow.com/q/29338126/90527)", "[std::wstring VS std::string](//stackoverflow.com/q/402283/90527)". – outis May 18 '22 at 06:09
  • Does this answer your question? [char vs wchar\_t when to use which data type](//stackoverflow.com/q/45677774/90527) – outis May 18 '22 at 06:14
  • 1
    You may want to read this: [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/) – Andreas Wenzel May 18 '22 at 08:47

1 Answers1

4

A char is typically 8 bits. It may be signed or unsigned (it's up to the compiler), so may have any integer value from -128 to 127 (for signed) or 0 to 255 (for unsigned). If a character can be encoded in that range then it can be stored in a single char variable.

There's also wide characters (wchar_t) whose size depends again on compiler and operating system. They are usually at least 16 bits.

Then there are explicit Unicode characters, char8_t for UTF-8 encoded characters (will be added in the C++23 standard, so might not be widely available yet), char16_t for 16-bit characters in UTF-16 encoding, and char32_t for 32-bit characters in UTF-32 encoding.

For emojis, or just Unicode characters in general, a single char is usually not enough. Use either (multiple) chars/char8_ts in UTF8 encoding, or use (possibly multiple) char16_ts, or use char32_t. Or, if you're targeting Windows and using the Windows API, they use 16-bit wchar_t for UTF-16 encoded characters.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 1
    even a single `char32_t` isn't enough because [lots of characters are made up from multiple Unicode code points](https://stackoverflow.com/a/69414798/995714): ️‍‍❤️1️⃣ ‍‍‍‍️‍❤️‍‍ – phuclv May 18 '22 at 06:29