C++ parsing a string containing wchars

Question

My C++ application receives a string that contains some wchars (in particular, the string is printed like this: "€¥$₱"). I have to parse it so that it returns a vector of strings where each element of the vector is one of the characters of the string (so the desired output is [€, ¥, $, ₱]). I have tried with this snippet of code:

for(auto letter : originalString)
{
    outputVector.push_back(std::string(1, letter));
}

But, as you can correctly guess, the output isn't correct because, at least as I understood, some of the characters of the original string are bigger than a char. How can i correctly parse this string containing characters and what appears to be wchars? Would it be enough to cast the received string to a wstring, then parse the wstring?

You cant't simple "cast" a `std::string` to a `std::wstring`. This discussion may help [C++ Convert string (or char*) to wstring (or wchar_t*)](https://stackoverflow.com/q/2573834/10871073). — Adrian Mole, Feb 29 '20 at 14:27
You need to figure out which unicode encoding that is used first. Is it UTF-8, UTF-16, UTF-16LE or UTF-32? — Ted Lyngmo, Feb 29 '20 at 14:28
You need to read this https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ — n. m. could be an AI, Feb 29 '20 at 14:28
Thank you for your answers. From @n.'pronouns'm. link I can understand, given that the fil is formatted in UTF-8, that euro, yen and philippine peso symbols will take two bytes, while the dollar only once. But how can I dynamically parse a string like this? I mean, is there a way to know how many bytes does the first, second, third etc. symbols occupy? I think maybe the best will be to encode the file as utf-16 so I will know that every of these characters will occupy two bytes regardless? — Garu94, Feb 29 '20 at 14:57
The simplest way is probably to *convert* (not cast) your string to a wide character string. If you know that your encoding is UTF-8, you can know how many bytes a character has by looking at its first byte, but this is going to be quite a bit more work. — n. m. could be an AI, Feb 29 '20 at 15:11

C++ parsing a string containing wchars

0 Answers0