3

I would like to convert a string variable to wstring due to some german characters that cause problem when doing a substr over the variable. The start position is falsified when any these special characters is present before it. (For instance: for "ä" size() returns 2 instead of 1)

I know that the following conversion works:

wstring ws = L"ä";

Since, I am trying to convert a variable, I would like to know if there is an alternative way for it such as

wstring wstr = L"%s"+str //this is syntaxically wrong, but wanted sth alike

Beside that, I have already tried the following example to convert string to wstring:

string foo("ä"); 
wstring_convert<codecvt_utf8<wchar_t>> converter;
wstring wfoo = converter.from_bytes(foo.data());
cout << foo.size() << endl;
cout << wfoo.size() << endl;

, but I get errors like

‘wstring_convert’ was not declared in this scope

I am using ubuntu 14.04 and my main.cpp is compiled with cmake. Thanks for your help!

Community
  • 1
  • 1
zuubs
  • 149
  • 4
  • 18
  • 7
    `std::wstring wstr = std::wstring(str.begin(), str.end());` `wstring` has a constructor that takes a iterators to the start and end of a `std::string` that will perform the conversion for you. – lcs Aug 05 '14 at 14:51
  • Possible duplicate of [C++ Convert string (or char*) to wstring (or wchar_t*)](http://stackoverflow.com/questions/2573834/c-convert-string-or-char-to-wstring-or-wchar-t) – Antonio Aug 05 '14 at 14:53
  • 5
    @millsj It may have a constructor which will take iterators, but it doesn't do any correct conversion: it simply takes the integral value of each `char`, and converts it to a `wchar_t` (which is _definitely_ not what the OP wants). – James Kanze Aug 05 '14 at 14:59
  • @millsj I just run your suggestion, but size() returns 2. – zuubs Aug 05 '14 at 14:59
  • @Antonio The first answer there seems to answer the question for C++11. For C++ pre-11, there isn't a solution in the C++ standard, although there may be a platform specific one. – James Kanze Aug 05 '14 at 15:01
  • `libstdc++` still does not implement `wstring_convert`. You need to change your compiler and the standard library (to `clang` and `libc++`), or use a platform-specific conversion library like `iconv`, or wait until `libstdc++` implements the full C++11 standard. – n. m. could be an AI Aug 05 '14 at 15:08
  • @millsj, actually looking at the comments [here](http://stackoverflow.com/a/8969776/2436175) your suggestion might be dangerous. – Antonio Aug 05 '14 at 15:09
  • @millsj this does not do any UTF8-to-anything conversion. – n. m. could be an AI Aug 05 '14 at 15:11
  • @JamesKanze The question doesn't seem to be C++ specific, therefore I think it's duplicate of the other, where looking at all answers the picture appear to be already quite complete. – Antonio Aug 05 '14 at 15:13
  • Well, another possibility is to code a UTF8-to-UCS4 converter yourself, or lift one from any number of sources on the internet. It is actually very easy. – n. m. could be an AI Aug 05 '14 at 15:41

3 Answers3

3

The solution from "hahakubile" worked for me:

std::wstring s2ws(const std::string& s) {
    std::string curLocale = setlocale(LC_ALL, ""); 
    const char* _Source = s.c_str();
    size_t _Dsize = mbstowcs(NULL, _Source, 0) + 1;
    wchar_t *_Dest = new wchar_t[_Dsize];
    wmemset(_Dest, 0, _Dsize);
    mbstowcs(_Dest,_Source,_Dsize);
    std::wstring result = _Dest;
    delete []_Dest;
    setlocale(LC_ALL, curLocale.c_str());
    return result;
}

But the return value is not 100% correct:

string s = "101446012MaßnStörfall   PAt  #Maßnahme Störfall                      00810000100121000102000020100000000000000";
wstring ws2 = s2ws(s);
cout << ws2.size() << endl; // returns 110 which is correct
wcout << ws2.substr(29,40) << endl; // returns #Ma�nahme St�rfall with symbols

I am wondering why it replaced german characters with symbols.

Thanks again!

zuubs
  • 149
  • 4
  • 18
  • It is probably correct, but your terminal is not set up to print it. Either that, or using `cout` together with `wcout` breaks things for you. It is technically illegal. Use only `wcout` throughout. It works for me once I replace `cout` with `wcout` in your example. – n. m. could be an AI Aug 05 '14 at 15:36
  • Also be aware that this only works if the user's default locale is "something.UTF-8". It is so for most, but not all, Linux users. – n. m. could be an AI Aug 05 '14 at 15:38
  • @n.m. with only cout it displays the string with "question mark" symbol and with only wcout it doesnt show the german characters at all! – zuubs Aug 05 '14 at 15:45
  • What is your default locale? What terminal are you using? Can you see these characters when you just cat a file that contains them? What if you redirect the output to a file and then edit the file? You obviously cannot use only cout with wstrings. – n. m. could be an AI Aug 05 '14 at 15:49
  • LANG=en_US.UTF-8 LANGUAGE=en_US LC_CTYPE="en_US.UTF-8" LC_NUMERIC=de_DE.UTF-8 LC_TIME=de_DE.UTF-8 LC_COLLATE="en_US.UTF-8" LC_MONETARY=de_DE.UTF-8 LC_MESSAGES="en_US.UTF-8" LC_PAPER=de_DE.UTF-8 LC_NAME=de_DE.UTF-8 LC_ADDRESS=de_DE.UTF-8 LC_TELEPHONE=de_DE.UTF-8 LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=de_DE.UTF-8 LC_ALL= I am using ubuntu 14.04's terminal and for redirecting, it just worked! Thanks! – zuubs Aug 05 '14 at 15:54
1

If you are using Windows/Visual Studio and need to convert a string to wstring you should use:

#include <AtlBase.h>
#include <atlconv.h>
...
string s = "some string";
CA2W ca2w(s.c_str());
wstring w = ca2w;
printf("%s = %ls", s.c_str(), w.c_str());

Same procedure for converting a wstring to string (sometimes you will need to specify a codepage):

#include <AtlBase.h>
#include <atlconv.h>
...
wstring w = L"some wstring";
CW2A cw2a(w.c_str());
string s = cw2a;
printf("%s = %ls", s.c_str(), w.c_str());

You could specify a codepage and even UTF8 (that's pretty nice when working with JNI/Java).

CA2W ca2w(str, CP_UTF8);

If you want to know more about codepages there is an interesting article on Joel on Software: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.

These CA2W (Convert Ansi to Wide=unicode) macros are part of ATL and MFC String Conversion Macros, samples included.

Sometimes you will need to disable the security warning #4995', I don't know of other workaround (to me it happen when I compiled for WindowsXp in VS2012).

#pragma warning(push)
#pragma warning(disable: 4995)
#include <AtlBase.h>
#include <atlconv.h>
#pragma warning(pop)

Edit: Well, according to this article the article by Joel appears to be: "while entertaining, it is pretty light on actual technical details". Article: What Every Programmer Absolutely, Positively Needs To Know About Encoding And Character Sets To Work With Text.

lmiguelmh
  • 3,074
  • 1
  • 37
  • 53
-1

The main point is that

string foo("ä")

Is already an error. Start from here and read all answers. And beware, one is very wrong :)

Community
  • 1
  • 1
Antonio
  • 19,451
  • 13
  • 99
  • 197