6

I'm looking for a bit of advice on the best way to convert a std::wstring to std::string - but a quick and dirty conversion for use as a key in an std::map<std::string, int> object.

The map is quite large, and is already well integrated into the project already, and there are only a handful of keys that require this conversion so I think it will be wasteful to change the map into one that accepts std::wstring as the key.

The output of the conversion doesn't really matter, but it has to be consistent so as to reliably pull the correct values from the map every time.

The application is a Windows only application.

Is there any known process to do a rough conversion reliably for this purpose? Or would the best way be via the usual, proper, conversion process (as described in this SO question/answer: How to convert wstring into string?)?

Edit: Please bear in mind - losing information is fine as long as things are consistent. i.e. If I throw in some Japanese characters, and they consistently convert into the same (potentially garbage) std::string, that's fine. This will never be for display, only to be used as a key to pull out values from a map.

Thanks!

Community
  • 1
  • 1
Jace
  • 3,052
  • 2
  • 22
  • 33
  • An [adapter](http://en.wikipedia.org/wiki/Adapter_pattern)? – krlmlr Mar 11 '13 at 07:35
  • 2
    Perhaps you should convert the `std::wstring` to [UTF-8](http://en.wikipedia.org/wiki/UTF-8), and set the `std::string` to that value. You will avoid spurious '\0' bytes this way. – Brett Hale Mar 11 '13 at 07:55
  • Why do you use `std::wstring` at all? [Use UTF8 everywhere.](http://utf8everywhere.org/) – Arne Mertz Mar 11 '13 at 08:49
  • @Arne Mertz: Thanks for the link to that article. Interesting read. But to answer your question: There's no choice in this case :P – Jace Mar 11 '13 at 09:46

3 Answers3

11

As a variant, I would go for

std::wstring w( L"Some" );
std::string s( w.begin(), w.end() );

Maybe the other answer is faster (depends on string iterators' implementation), but this is a more std\stl way as for me. But yep, this will lose some unique characters.

Roman Kruglov
  • 3,375
  • 2
  • 40
  • 46
  • 1
    Yes loosing some characters here and there when doing string conversions usually isn't a big deal! ;) – Andreas Jul 04 '18 at 08:23
  • @mrt If you read the question, it asks for "a quick and dirty conversion" and "the output of the conversion doesn't really matter", only "it has to be consistent". Also there is a whole edit addition which describes exactly that "losing information is fine as long as things are consistent" and "convert into the same (potentially garbage) string", also "this will never be for display". – Roman Kruglov Jul 04 '18 at 08:40
  • 1
    No do not do this. It works only for English. It destroys text in Chinese, Japanese and so on. – Katsutoshi Hayashida Sep 16 '20 at 23:56
10

If you are not interested in the semantic of the content, but just to the content to be comparable, I'll just coherce the inner wchar[] into a char[] of doubled size and use it to initialize the string (by specifying address/size in the constructor)

std::wstring ws(L"ABCD€FG");
std::string s((const char*)&ws[0], sizeof(wchar_t)/sizeof(char)*ws.size());

Now s is unprintable (it may contain null chars) but still assignable and comparable.

Yo can go back as:

std::wstring nws((const wchar_t*)&s[0], sizeof(char)/sizeof(wchar_t)*s.size());

Now compare

std::cout << (nws==ws)

should print 1.

However, note that this way the order in the map (result of operator<) is ... fuzzy because of the presence of the 0, and don't reflect any text sematics. However search still works, since -however fuzzy- it is still an "order".

Dee
  • 7,455
  • 6
  • 36
  • 70
Emilio Garavaglia
  • 20,229
  • 2
  • 46
  • 63
  • 1
    This will distract the OP: the point is obviously not to have a beautiful print-out, but to check that no loss of information arise during a cycle. Whatever output that changes depending of the equality plays the same. I did the shorter and simpler, not requiring additional headers. It's up to the OP find the best "beautifier" for his need (including replacing cout with MessageBox or whatever dialog displaying whatever things he would like) – Emilio Garavaglia Mar 12 '13 at 07:34
  • 2
    If this were an unordered_map, I would be concerned about the null bytes within the string. If the hashing function has a specialization for strings, it may or may not respect the actual std::string size and instead stop at the first null byte. – Adrian McCarthy Mar 27 '14 at 20:17
8

You can convert std::wstring to utf-8 (using WideCharToMultiByte or something like this lib: http://utfcpp.sourceforge.net/), that is, a null-terminating c-string, and then construct std::string from it. This conversion will be reversible.

user2155932
  • 760
  • 3
  • 9