Display unicode character in fmt lib C++

Question

I want to display the infinity symbol ∞ which has Unicode U+221E. I am currently using the fmt library, it is supposed to have a lot of support and be cross-platform.

fmt::print("", fmt::styled("∞ >", fmt::emphasis::bold | fg(fmt::color::aquamarine)));

I get the following output:

? >

I also tried setting: setlocale(LC_ALL, "en_US.UTF-8"); doesn't help. I am on Windows 11 x64.

WARNING:

warning C4566: character represented by universal-character-name '\u221E' cannot be represented in the current code page (1252)

MS Visual Studio 2022 IDE.

Should I change the Character Set, in project properties? Currently set to: Use Unicode Character Set, second option is: Use Multi-Byte Character Set.

It depends on the shell, in which you are trying to display it. — foragerDev, Feb 27 '23 at 11:46
Run your code in different shell. Use the one which support unicodes. — foragerDev, Feb 27 '23 at 11:48
Looks like your file is being compiled to the windows 1252 code space, you'll need to compile it as some form of unicode encoding https://stackoverflow.com/questions/12040539/utf-8-compatibility-in-c — Alan Birtles, Feb 27 '23 at 11:50
And please consider this too https://learn.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-1-c4566?view=msvc-170 — foragerDev, Feb 27 '23 at 11:51
Sorry, nothing to do with the shell but the terminal :-) Or is a windows shell always the terminal also? Maybe :-) — Klaus, Feb 27 '23 at 12:00
There are at least three different issues here, and you have to get all of them right, 1) is your code correct 2) is the output device capable of displaying unicode 3) is your source file capable of storing unicode. The third point is a surprise to some, but if you are using string literals containing unicode characters then either the source file itself must be stored as unicode, or you should use a unicode escape sequence. Since we are talking Windows here, unicode in this context means UTF-16. — john, Feb 27 '23 at 12:07
you need to save the file as UTF-8 and might also need some other changes [some other changes](https://stackoverflow.com/a/63454192/995714) to use UTF-8 — phuclv, Mar 08 '23 at 05:00

score 2 · Accepted Answer · edited Mar 08 '23 at 02:08

2

For this to work, you need to compile with the /utf-8 flag. Among other things, this sets the string literal encoding to UTF-8. It is detected by {fmt}, which uses Unicode APIs to write to a console. Changing the locale won't help in general, because the problem is in the console codepage. Using wide strings won't help, for the same reason.

edited Mar 08 '23 at 02:08

Remy Lebeau

555,201
31
458
770

answered Feb 27 '23 at 15:58

vitaut

49,672
25
199
336

1

"Changing the locale won't help in general because the problem is in the console codepage." The first problem is the *compiler's* codepage, but `/utf-8` fixes that. – Mooing Duck Feb 27 '23 at 16:06
The compiler codepage is less of a problem because it shouldn't affect the representation of narrow string literals (but it often breaks u8 literals). In principle it is possible to make it work by compiling with `FMT_UNICODE` defined to 1 but `/utf-8` is a cleaner option. – vitaut Feb 27 '23 at 16:19

score 1 · Answer 2 · answered Mar 08 '23 at 02:22

The problem is during compilation, not runtime. You are specifying the ∞ character as-is in a string literal, but the compiler is parsing your source code using codepage 1252, which does not support Unicode U+221E (and that character can't be stored in a single char anyway). So, you are losing ∞ during compiling, it is being replaced with ?, before the fmt library ever sees it.

setlocale() has no effect on this issue since it is only processed at runtime.

And the project's Character Set option has no effect on this issue either, because it only affects the compilation of TCHAR-based APIs, nothing else.

So, you have 2 choices:

make sure your source code is saved in UTF-8 format, and compile it with the /utf8 compiler flag specified.
use wide strings instead, eg:

fmt::print(L"", fmt::styled(L"∞ >", fmt::emphasis::bold | fg(fmt::color::aquamarine)));

score -1 · Answer 3 · answered Feb 27 '23 at 11:55

-1

First of all, there are different types of Unicode formats. It is likely that you are trying to print the UTF8 version of the infinity sign, but Windows uses UTF16 by default so you get crazy text instead. It actually has nothing to do with fmt.

For starters, on C++11 and later, you can change the character set used by cout, cerr, etc. to UTF8 using this code:

#include <locale>

template <typename StreamT>
void set_utf8_locale(StreamT& stream) {
    std::locale loc(std::locale(), new std::codecvt_utf8<typename StreamT::char_type>());
    stream.imbue(loc);
}

set_utf8_locale(std::cout);

Then you need to make sure your compiler is storing character strings as UTF8. You can do that by prefixing u8 before the string, such as u8"abc".

In C++20 and later, this yields a different type from const char*, so you need to use reinterpret_cast<const char*>(u8"abc") to get the correct type.

Linux and MacOS use UTF8 for output so you don't need to worry about this for those platforms.

answered Feb 27 '23 at 11:55

Zenul_Abidin

573
8
23

I work with C++ 20. – Alix Blaine Feb 27 '23 at 11:57
`reinterpret_cast(u8"λ >")` doesn't work however. – Alix Blaine Feb 27 '23 at 12:01
@AlixBlaine I wouldn't expect it to, you are putting a non-ASCII character in a string literal, and hoping that some how the editor and compiler will sort it out. – john Feb 27 '23 at 12:11
If you want to use `u8"λ >"` as a UTF-8 string then what you need to do is open your source file in a hex editor and check that the string has indeed been stored as the correct UTF-8 sequence, – john Feb 27 '23 at 12:14
@john, so, how can I display it then? – Alix Blaine Feb 27 '23 at 12:15
1

@AlixBlaine As I said in a comment above there are a variety of things you must get right. Personally on windows I would use UTF-16, it's the native encoding (more or less), but if you want to use UTF-8 then you can eliminate one of the possible problems by using UTF-8 encoding in your string literals `"\xE2\x88\x9E >"` is your string correctly encoded. But as I said, that's only one issue. – john Feb 27 '23 at 12:19
@AlixBlaine E2889E is the correct encoding for infinity (as in the original question) but not lambda. Lambda is `\xCE\xBB` – john Feb 27 '23 at 12:24
`\xCE\xBB` gives me `I» >` – Alix Blaine Feb 27 '23 at 12:29
1

@AlixBlaine the reinterpret_cast is only to avoid compile errors when you assign a u8-string `u8"..."` to a `char*`. Did you try imbuing a different `codecvt` facet into the output stream like in my first snippet? – Zenul_Abidin Feb 27 '23 at 12:34
1

@AlixBlaine Pretty clear from that output that the UTF-8 sequence is not being interpreted as UTF-8. Getting Unicode to work involves getting several pieces to cooperate. Your code, the standard library and the output device. As I said, I think UTF-16 and wide strings (`std::wcout << L"\u221E"` for example) is simpler on Windows. – john Feb 27 '23 at 12:54
1

@AlixBlaine there are **three** different aspects to this problem, and all have to be correct. First is your source file encoding. Second is the form of your literals and string variables. Third is the output encoding. – Mark Ransom Feb 27 '23 at 14:50
@MarkRansom, I want this code to work cross-platform, Linux, Unix, and MacOS. Not only on Windows. – Alix Blaine Feb 27 '23 at 16:23

score -1 · Answer 4 · answered Feb 27 '23 at 14:08

You're error is C4566. This is arising because Unicode formatting, especially for non standard ASCII characters goes crazy with C++.

This: char inf = '∞' breaks because not enough bits are allocated. same thing happens with string inf = '∞', string inf = L'∞', and so on..

The most convenient thing to do is to use wide characters (wchar_t) with an array to be able to breathe. There is probably another way of doing this, but this is the method I have gotten to work with the inifity symbol/Unicode symbols.

wchar_t inf[] = L"\u221e"; // ∞ in unicode is U+221E

Narrow strings (one-byte characters) are converted to multi-byte characters whereas wide strings (two-byte characters) are not.T his works because the L converts the string to a wide literal, which in layman's terms makes space for more bits so they don't go weird..

`string inf = '∞'` and similar work just fine in MSVC C++, as long as the input code file is UTF-8 with a BOM and `/utf-8` is passed. — Mooing Duck, Feb 27 '23 at 16:05

Display unicode character in fmt lib C++

4 Answers4