0

According to the answer to this question std::wstring can either be u16string or u32string.

According to the first answer to this question one can simply convert to u16string and get std::wstring as a result.

What I wonder is: how do I know if I have 16 or 32-bit representation? If I want to convert UTF8 to std::wstring, tt looks like I can't use the solution given because I don't know what the run-time will be.

So, now how do I convert it properly? Or this is not relevant and the conversion will always succeed in that case independently if I have 16-bit or 32-bit representation without ever losing anything?

Can someone please clarify?

EDIT:

All this comes from the fact that here on my Windows-based laptop (Win8.1) with MSVC 2010, converting the string "abc" ("abc") fails with following code:

std::wstring_convert<std::codecvt_utf8<wchar_t> > myconv;
std::wstring table_name = myconv.from_bytes( (const char *) tableName );

I didn't try it yet on Linux/Mac, but seeing that Windows is failing tells me it's not a good sign and I'm doing something wrong.

Community
  • 1
  • 1
Igor
  • 5,620
  • 11
  • 51
  • 103
  • 5
    `std::wstring` will *never* be the same as `u16string` or `u32string`: the character types `wchar_t`, `char16_t`, and `char32_t` are all different types. However, the encoding for `std::wstring` is the same of either `std::u16string` or `std::u32string`. – Dietmar Kühl May 28 '16 at 19:54
  • The size of `wchar_t` is 16 bits on a system with 16-bit `wchar_t`, i.e. in Windows. The number of bits per byte is given by `CHAR_BIT` from ``. The number of bytes per `wchar_t` is given by `sizeof(wchar_t)`. – Cheers and hth. - Alf May 28 '16 at 19:54
  • @DietmarKühl, so then the question remains - how do I convert utf-8 "const char *" to std::wstring without loosing the precision for every possible combination? – Igor May 28 '16 at 19:58
  • 2
    It's worth noting that Window's wide characters (UTF-16) is non-standard with respect to the C and C++ standards, and conversely that those standards are, uhm, a bit political wrt. Windows. This means that e.g. the character classification functions in principle can't work as intended in Windows. But in practice the non-BMP part of Unicode isn't much used; I have the impression that it's mostly old Chinese glyphs. – Cheers and hth. - Alf May 28 '16 at 19:58
  • @Cheersandhth.-Alf, ok. but then what is the proper way to convert? – Igor May 28 '16 at 19:58
  • Regarding utf8 encoding and conversion to string types, [cppreference](http://en.cppreference.com/w/cpp/locale/codecvt) may help. If you want a specific encoding (UTF16 or 32) then use the corresponding `std::uXXstring`, not `std::wstring`. – coyotte508 May 28 '16 at 19:59
  • @Igor: You can select, at compile time, the proper conversion based on the size of `wchar_t`. – Cheers and hth. - Alf May 28 '16 at 20:01
  • @coyotte508, I'm looking to convert a SQLite query result to std::wstring. So it needs to be cross-platform. – Igor May 28 '16 at 20:01
  • 1
    If you got a bytes sequence and know its encoding, `std::wstring_convert, char, std::mbstate_t>, Char>` seems to be the class to convert to and from this encoding. I haven't used that one, though. – Dietmar Kühl May 28 '16 at 20:04
  • For portable Unicode handling, there is ICU for example. C++'s Unicode support is, well, lacking at least. – Baum mit Augen May 28 '16 at 20:05
  • @Cheersandhth.-Alf, please see my edit. I am trying Windows first and it fails. – Igor May 28 '16 at 20:08
  • 1
    @Igor Is there a reason why `std::wstring` specifically, why not just `std::u16string` which is also cross-platform? Do other parts of the code use `std::wstring` and encode if from utf8 as well? – coyotte508 May 28 '16 at 20:08
  • @coyotte508, yes, everything is wstring based. Trouble is SQLite is C library that works with const char * as UTF-8. Also, please see my edit. – Igor May 28 '16 at 20:10
  • MSVC 2010 is **old**. In particular, 2010 comes before 2011. Use a newer compiler. – n. m. could be an AI May 28 '16 at 20:10
  • @n.m., I got it from my school and its only one available that is Windows-based. – Igor May 28 '16 at 20:12
  • Sorry, I don't have a magic way to turn MSVC2010 into something that supports a 2011 C++ language standard. You can download a newer compiler from Microsoft for free (the command line tool, not sure about IDEs). – n. m. could be an AI May 28 '16 at 20:15
  • @n.m., what do you mean? MSVC 2010 supports C++11 just fine. Or you mean that its CRT that is old? also, is my code going to crash with newer compiler? – Igor May 28 '16 at 20:21
  • If your compiler cannot compile these lines, it **does not** support C++11. – n. m. could be an AI May 28 '16 at 20:22
  • @n.m., it compiles, but throws an exception – Igor May 28 '16 at 20:23
  • @Igor: Note the years: 2010, 2011. Visual C++ 2010 did not support C++11. You can however use the Windows API for conversion in Windows, and some functionality I think is called iconv in Unix-land, if you want to avoid dependency on 3rd party library. The Windows function is [MultiByteToWideChar](https://msdn.microsoft.com/en-us/library/windows/desktop/dd319072%28v=vs.85%29.aspx). – Cheers and hth. - Alf May 28 '16 at 20:25
  • If you have a reasonably fast network connection you can just download Visual Studio 2015 Community Edition, which is free for personal use, and use C++11 standard library functionality (as Dietmar suggested). – Cheers and hth. - Alf May 28 '16 at 20:26
  • "but throws an exception" — then it still doesn't support C++11. A conforming compiler should not have a problem with this code. – n. m. could be an AI May 28 '16 at 20:28
  • @Cheersandhth.-Alf One mustn't neglect the Pile of Poo emoji, which at U+1F4A9 is also outside the BMP. (And in general new characters are going in outside the BMP.) – Alan Stokes May 28 '16 at 20:47
  • @n.m., I just tried to run this code in Linux with gcc 5.2. There was no exception thrown. So I guess the copy of MSVC I have here is broken. – Igor May 29 '16 at 03:11
  • Something is indeed broken, because my copy compiles and runs this code OK. Can you show your entire program? – n. m. could be an AI May 29 '16 at 07:21

0 Answers0