How to write a portable c++ code with unicode support?

Question

I have the below program, which tries to print Unicode characters on a console by enabling the _O_U16TEXT mode for the console:

#include <iostream>

#include <fcntl.h>
#include <io.h>

int main()
{
   _setmode(_fileno(stdout), _O_U16TEXT);
   wprintf(L"test\n \x263a\x263b Hello from C/C++\n");
   
   return 0;
}

What is unclear to me is that, I have seen many C++ projects (with the same code running on Windows and Linux) and using a macro called _UNICODE. I have the following questions:

Under what circumstance do I need to define the _UNICODE macro?
Does enabling the _UNICODE macro mean I need to separate the ASCII related code by using #ifdef _UNICODE? In case of the above program, do I need to put any #ifdef UNICODE and the ASCII code processing in #else?
What extra do I need to enable the code to have Unicode support in C/C++? Do I need to link to any specific libraries on Windows and Linux?
Why does my sample program above not need to define the _UNICODE macro?
When I define the _UNICODE macro, how does the code know whether it uses UTF-8, UTF-16 or UTF-32? How do I decide between these Unicode types?

It is used by the SDK headers, you don't need `#include ` in a console app. wprintf() unambiguously uses wchar_t, when you target Windows then it is utf16. — Hans Passant, Dec 12 '21 at 13:15
@hanspassant - Could you please elaborate your answer in view of my bulleted points in the question as that would really clarify my basics where I seem not to connect the dots. — Test, Dec 12 '21 at 13:17
`_UNICODE` macro determines whether e.g. `CreateFile` macro expands to `CreateFileA` (a function taking `char*` arguments) or `CreateFileW` (a function taking `wchar_t*` arguments). Or whether `_T("x")` expands to `"x"` or `L"x"`. It has no effect on "portable C++ code", if by that you mean code that limits itself to C++ core language and standard library, and doesn't use any Windows-specific features. Historically, the macro allowed writing code that could be compiled both for Windows 95 (MBCS-based) and Windows NT (Unicode-based). — Igor Tandetnik, Dec 12 '21 at 14:50
Since you are using `wprintf` in your example - a Windows-specific header `tchar.h` provides a macro [`_tprintf`](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/printf-printf-l-wprintf-wprintf-l?view=msvc-170#generic-text-routine-mappings) that expands to `printf` or `wprintf` based on whether `_UNICODE` is defined. — Igor Tandetnik, Dec 12 '21 at 14:56
@IgorTandetnik - What has UNICODE meaning on Linux systems ? As per your statement expansion of CreateFile to CreateFileA or CreateFileW is meaningful only for Windows, isn't it? — Test, Dec 12 '21 at 14:57
I'm reasonably sure `_UNICODE` is Windows-specific and has no special meaning to a typical C++ toolchain used on Linux. I'm not really familiar with non-Windows toolchains, though. What I'm 100% sure about is that `_UNICODE` macro is not mentioned in the C++ standard, and has no effect on "portable C++ code" (defined, again, as code confined within the four corners of the standard). — Igor Tandetnik, Dec 12 '21 at 14:58
@IgorTandetnik - Since you mentioned that you are not familiar , So will you be able to clarify that how do we deal with the unicode on Unix systems i.e how will relevant c++ code will look in Unix system? — Test, Dec 12 '21 at 15:02
No, I don't know enough about Unix-based systems, sorry. I'm reasonably sure that defining `UNICODE` or `_UNICODE` would neither help nor hurt. Note also that `_setmode` and `_fileno` (used in your example) are Windows-specific. — Igor Tandetnik, Dec 12 '21 at 15:05
If you really want portable code, forget about `wstring`. Use UTF-8 encoded strings and convert them to UTF-16 when passing to WinAPI functions with W postfix. AFAIK most modern unix systems use UTF-8 by default. — Osyotr, Dec 12 '21 at 15:52
@Genjutsu - is wstring windows specific? if not, then why do you say to ignore those? How will i implement unicode support in linux? Can you elaborate as your response isn't very concise in explaining what you want to say? — Test, Dec 12 '21 at 15:54
`wstring` is utf16 on windows, but UTF32 on linux. First, you should define what is unicode support for your application. On Linux you don't really need to do anything, simple `cout << "Hello мир!";` will most likely work out of the box. On windows you may need to call `SetConsoleOutputCP(65001);` first. — Osyotr, Dec 12 '21 at 16:08
just avoid `std::wstring` and use UTF-8 everywhere with [UTF-8 locale in Windows](https://stackoverflow.com/a/63454192/995714) — phuclv, Dec 12 '21 at 17:33
`UNICODE` (no underscore) applies to the Win32 API only. `_UNICODE` (with underscore) applies to the C runtime library. They are commonly (un)defined together, but they are technically separate. — Remy Lebeau, Dec 14 '21 at 18:20
Use a third-party library which has already took care of portability for I/O and string handling. — n. m. could be an AI, Dec 14 '21 at 18:26

How to write a portable c++ code with unicode support?

0 Answers0