Re your actual question
” what is [the L
prefix] and how to add it to dynamic strings?
This is very different from the title of the question at the time I’m writing this, namely “How can I make dynamic strings to work with UTF-8 in console?”
In short, UTF-8 is an encoding of Unicode where the basic encoding unit is 8 bits, commonly called a byte (more precisely it's an octet), while the L
prefix forms a wide character or string literal, where the encoding unit typically is 16 or 32 bits – in Windows it’s 16 bits, as in original Unicode.
A wide character or string literal is based on the wchar_t
type instead of char
.
In Windows a wide string is encoded as UTF-16. The most common sixty thousand or so Unicode characters are represented with single wchar_t
values, but some seldom used Chinese ideograms etc. require two successive wchar_t
values, called a surrogate pair.
The use of 16 bit encoding unit in Windows was established around 1992. I am not sure when UTF-16 was adopted (as an extension of then UCS-2 encoding), it was just a bit later. So this was established long before C99 required that all characters of the wide character set should be representable with single wchar_t
values. That requirement appears to have been a pure political maneuver, ensuring that no Windows C compiler could be formally conforming, a general ISO programming language standard that applied only to Unix-land. Unfortunately, since C++11 was based on C99 we now have that also in C++11, ensuring that no Windows C++ compiler can be fully conforming. Pure idiocy. If you ask me.
Errata, re deleted text above: according to Wikipedia’s article about it the wording about a single wchar_t
being sufficient for any character in the “extended character set” was there already in C90. Which makes the incompatibility between Windows and the C and C++ standards the fault of Microsoft, not the fault of the C committee. It still appears to be political and fairly idiotic, but (enlightened) with others to blame than I maintained at first…
One way to work with wide dynamic strings is to use std::wstring
, from the <string>
header.
With Visual C++ you can use a wmain
function instead of standard main
, as an easy way to get wide command line arguments.
wmain
is also supported by MinGW64 (IIRC) g++, although not yet by ordinary MinGW g++, as of g++ 4.8.something. It is however easy to implement in terms of the Windows API. Unless you require strict standard-conforming code that provides the special main function features such as ability to declare it with or without arguments, but hey, let's be practical about things.
Example that compiles fine with both Visual C++ 12.0 and g++ 4.8.2:
// Source encoding: UTF-8 with BOM.
#include <io.h> // _setmode
#include <fcntl.h> // _O_WTEXT
#include <iostream> // std::wcout, std::endl
#include <string> // std::wstring
using namespace std;
auto main()
-> int
{
_setmode( _fileno( stdin ), _O_WTEXT );
_setmode( _fileno( stdout ), _O_WTEXT );
wcout << L"Hi, what’s your name? ";
wstring username;
getline( wcin, username );
wcout << L"Welcome to Windows C++, " << username << "!" << endl;
}
Note that with Windows ANSI source this won’t compile with g++ unless you specify the source encoding with the appropriate compiler option.