0

The code below prints the character for the pi number in VS2019, instead of the character ã.

#include<iostream>
const char* p = "\u00E3";           // Character ã LATIN SMALL LETTER A WITH TILDE

int main() {
    std::cout << p << '\n';         // This should compile by any compiler supporting some ASCII compatible encoding.
                                    // It does compile in clang and GCC, printing `ã` in both. In VS2019 it prints the symbol for
                                    // the pi number. However if I debug the code I can see the character `ã` in
                                    // memory in the address given by `p`. What am I missing?
}

See demo in Coliru.

Edit

The question for which this is considered a duplicate doesn't use a universal-character-name as is the case in the above code. Therefore this should compile and execute correctly in VS2019.

John Kalane
  • 1,163
  • 8
  • 17
  • 1
    Does this answer your question? [C++ Visual Studio character encoding issues](https://stackoverflow.com/questions/1857668/c-visual-studio-character-encoding-issues) – Botje Jun 17 '21 at 15:20
  • @Botje See my edit above. – John Kalane Jun 17 '21 at 15:27
  • 1
    U+00E3 converted to UTF-8 is a pair of bytes `c3 a3`. Are you sure you're not seeing the cp1251 interpretation of a c3 byte `Г CYRILLIC CAPITAL LETTER GHE` ? In this case you need to switch your terminal and/or system encoding to UTF-8. – Botje Jun 17 '21 at 15:31
  • Possibly a code page issue. – ChrisBD Jun 17 '21 at 15:34
  • https://en.wikipedia.org/wiki/Code_page_437#Character_set – Hans Passant Jun 17 '21 at 15:34
  • @Botje U+00E3 is already the Unicode code point for the character `ã`, as stated [here](http://eel.is/c++draft/full#nt:universal-character-name) in the most recent C++ draft. – John Kalane Jun 17 '21 at 15:36
  • 2
    @JohnKalane -- What is the code page of the terminal? For code page 437, `e3` is a pi symbol. – PaulMcKenzie Jun 17 '21 at 15:39
  • @JohnKalane -- If you're using Windows, go back in your program editor, hold the Alt key down, and on the keypad enter `227`, which is `0xE3` in decimal. Do you see the pi symbol show up? If you do, then that gives you a clue as to what is occurring in the output. – PaulMcKenzie Jun 17 '21 at 15:47
  • @PaulMcKenzie Yes, I do see the pie symbol. I have no clue whatsoever. – John Kalane Jun 17 '21 at 15:54
  • @PaulMcKenzie There should be no issues with code pages as I said before, `\u00E3` is a universal-character-name for the character `ã`, unless there is a bug in VS2019. Also, I can see the character `ã` in the address `p` in memory when I debug the code in Visual Studio. – John Kalane Jun 17 '21 at 16:05
  • 1
    @JohnKalane -- *There should be no issues with code pages as I said before* -- Well obviously there is an issue, as again `0xE3` is the pi symbol for codepage 437. Coincidence? -- *\u00E3 is a universal-character-name for the character ã* -- Unicode is not "universal" -- if the terminal doesn't support it or understand it, then it's up to you to figure out how to make your terminal support Unicode. Unicode is a specification, not a mandate. – PaulMcKenzie Jun 17 '21 at 16:08
  • @JohnKalane Can you write that character to a file instead of the console? If it contains a single `e3` then C++ is doing its job and your console is to blame. That is why the answer I linked has references to `chcp` and `SetConsoleOutputCP` – Botje Jun 18 '21 at 07:04

1 Answers1

1

Yes, U+00E3 is the code point for ã. That is just a number. That number has to be encoded to be stored anywhere (memory, a file, etc.). You have an encoding issue. You wrote the byte 0xe3 to the terminal but its code page is cp437, where 0xe3 is decoded as π.

On Windows, using wide strings and setting the terminal mode to UTF-16 will work.

#include<iostream>
#include <io.h>
#include <fcntl.h>

const wchar_t* p = L"\u00E3"; // wide string

int main() {
    _setmode(_fileno(stdout), _O_U16TEXT); // set console mode
    std::wcout << p << '\n';
}

Output (compiled with VS2019 and run in Windows 10 console window):

ã

Note that the font the console uses has to support the characters printed, otherwise you get the replacement character U+FFFD (look of this character will vary with font as well).

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251