0

I have the below program, which tries to print Unicode characters on a console by enabling the _O_U16TEXT mode for the console:

#include <iostream>

#include <fcntl.h>
#include <io.h>

int main()
{
   _setmode(_fileno(stdout), _O_U16TEXT);
   wprintf(L"test\n \x263a\x263b Hello from C/C++\n");
   
   return 0;
}

What is unclear to me is that, I have seen many C++ projects (with the same code running on Windows and Linux) and using a macro called _UNICODE. I have the following questions:

  1. Under what circumstance do I need to define the _UNICODE macro?
  2. Does enabling the _UNICODE macro mean I need to separate the ASCII related code by using #ifdef _UNICODE? In case of the above program, do I need to put any #ifdef UNICODE and the ASCII code processing in #else?
  3. What extra do I need to enable the code to have Unicode support in C/C++? Do I need to link to any specific libraries on Windows and Linux?
  4. Why does my sample program above not need to define the _UNICODE macro?
  5. When I define the _UNICODE macro, how does the code know whether it uses UTF-8, UTF-16 or UTF-32? How do I decide between these Unicode types?
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Test
  • 564
  • 3
  • 12
  • It is used by the SDK headers, you don't need `#include ` in a console app. wprintf() unambiguously uses wchar_t, when you target Windows then it is utf16. – Hans Passant Dec 12 '21 at 13:15
  • @hanspassant - Could you please elaborate your answer in view of my bulleted points in the question as that would really clarify my basics where I seem not to connect the dots. – Test Dec 12 '21 at 13:17
  • 3
    `_UNICODE` macro determines whether e.g. `CreateFile` macro expands to `CreateFileA` (a function taking `char*` arguments) or `CreateFileW` (a function taking `wchar_t*` arguments). Or whether `_T("x")` expands to `"x"` or `L"x"`. It has no effect on "portable C++ code", if by that you mean code that limits itself to C++ core language and standard library, and doesn't use any Windows-specific features. Historically, the macro allowed writing code that could be compiled both for Windows 95 (MBCS-based) and Windows NT (Unicode-based). – Igor Tandetnik Dec 12 '21 at 14:50
  • 1
    Since you are using `wprintf` in your example - a Windows-specific header `tchar.h` provides a macro [`_tprintf`](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/printf-printf-l-wprintf-wprintf-l?view=msvc-170#generic-text-routine-mappings) that expands to `printf` or `wprintf` based on whether `_UNICODE` is defined. – Igor Tandetnik Dec 12 '21 at 14:56
  • @IgorTandetnik - What has UNICODE meaning on Linux systems ? As per your statement expansion of CreateFile to CreateFileA or CreateFileW is meaningful only for Windows, isn't it? – Test Dec 12 '21 at 14:57
  • 1
    I'm reasonably sure `_UNICODE` is Windows-specific and has no special meaning to a typical C++ toolchain used on Linux. I'm not really familiar with non-Windows toolchains, though. What I'm 100% sure about is that `_UNICODE` macro is not mentioned in the C++ standard, and has no effect on "portable C++ code" (defined, again, as code confined within the four corners of the standard). – Igor Tandetnik Dec 12 '21 at 14:58
  • @IgorTandetnik - Since you mentioned that you are not familiar , So will you be able to clarify that how do we deal with the unicode on Unix systems i.e how will relevant c++ code will look in Unix system? – Test Dec 12 '21 at 15:02
  • 1
    No, I don't know enough about Unix-based systems, sorry. I'm reasonably sure that defining `UNICODE` or `_UNICODE` would neither help nor hurt. Note also that `_setmode` and `_fileno` (used in your example) are Windows-specific. – Igor Tandetnik Dec 12 '21 at 15:05
  • 1
    If you really want portable code, forget about `wstring`. Use UTF-8 encoded strings and convert them to UTF-16 when passing to WinAPI functions with W postfix. AFAIK most modern unix systems use UTF-8 by default. – Osyotr Dec 12 '21 at 15:52
  • @Genjutsu - is wstring windows specific? if not, then why do you say to ignore those? How will i implement unicode support in linux? Can you elaborate as your response isn't very concise in explaining what you want to say? – Test Dec 12 '21 at 15:54
  • `wstring` is utf16 on windows, but UTF32 on linux. First, you should define what is unicode support for your application. On Linux you don't really need to do anything, simple `cout << "Hello мир!";` will most likely work out of the box. On windows you may need to call `SetConsoleOutputCP(65001);` first. – Osyotr Dec 12 '21 at 16:08
  • just avoid `std::wstring` and use UTF-8 everywhere with [UTF-8 locale in Windows](https://stackoverflow.com/a/63454192/995714) – phuclv Dec 12 '21 at 17:33
  • `UNICODE` (no underscore) applies to the Win32 API only. `_UNICODE` (with underscore) applies to the C runtime library. They are commonly (un)defined together, but they are technically separate. – Remy Lebeau Dec 14 '21 at 18:20
  • Use a third-party library which has already took care of portability for I/O and string handling. – n. m. could be an AI Dec 14 '21 at 18:26

0 Answers0