3

I am currently using the methods MultiByteToWideChar and WideCharToMultiByte of the Windows API to convert between std::string and std::wstring.

I am 'multiplatforming' my code removing Windows dependencies, so I would like to know alternative to the methods above. Concretely, using boost will be great. Which methods may I use? Here is the code I am currently using:

const std::wstring Use::stow(const std::string& str)
{
    if (str.empty()) return L"";
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
    std::wstring wstrTo( size_needed, 0 );
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
    return wstrTo;
}

const std::string Use::wtos(const std::wstring& wstr)
{
    if (wstr.empty()) return "";
    int size_needed = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL);
    std::string strTo( size_needed, 0 );
    WideCharToMultiByte                  (CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strTo[0], size_needed, NULL, NULL);
    return strTo;
}
Didac Perez Parera
  • 3,734
  • 3
  • 52
  • 87

3 Answers3

5

Basically using the <cstdlib> you can get away with a similar implementation to what you already have, as mentioned by Joachim Pileborg. As long as you have set the locale to whatever you want it to be (for example: setlocale( LC_ALL, "en_US.utf8" );

MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0) => mbstowcs(nullptr, data(str), size(str))

MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed) => mbstowcs(data(wstrTo), data(str), size(str))

WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL) => wcstombs(nullptr, data(wstr), size(wstr))

WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strTo[0], size_needed, NULL, NULL) => wcstombs(data(strTo), data(wstr), size(wstr))

EDIT:

requires strings to be allocated contiguously, which may be important if you are compiling cross-platform as previous standards did not require string to be allocated contiguously. Previously calling &str[0], &strTo[0], &wstr[0], or &wstrTo[0] could have caused problems.
Since is now the accepted standard, I've improved my suggested substitutions to use data rather than dereferencing the front of the strings.

Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • Hi Jonathan, it has worked! Do you think I can get memory errors using this approach? I am worried about memory allocation. – Didac Perez Parera Dec 04 '13 at 14:39
  • 1
    Why not call `str.c_str()` instead of using `&str[0]`? It will certainly be continuous. – Julien Dec 04 '13 at 16:32
  • 1
    Yes, I believe @Julien's suggestion will work for the source string provided `c_str()` works with `std::wstring` however it won't work for the destination string. For a destination string my initial thought would be that you should temporarily allocate a vector and then copy into a string. So something like this in wtos: `std::vector< char > temp( size_needed, '\0' ); std::wcstombs( &*temp.begin(), wstr.c_str(), wstr.size() ); std::string strTo( size_needed, '\0' ); std::copy( strTo.begin(), strTo.end(), temp.begin() );` This would probably make a great follow up question! – Jonathan Mee Dec 04 '13 at 19:55
  • 1
    This is a great way to do this for common code across platforms. If it still needs to run on Windows you'll need to add the _CRT_SECURE_NO_WARNINGS preprocessor macro. Otherwise it'll complain that you aren't using the Windows-specific mbstowcs_s – spfursich Jun 19 '15 at 23:31
0

From your code, it looks like you are using utf-8 encoding. For working with utf-8 take a look at UTF8-CPP at http://utfcpp.sourceforge.net/ which is a header only library

Look at the utf8to32 function. (Note that on Windows wchar_t is 16 bits, on other platforms such as linux it is usually 32bits)

John Bandela
  • 2,416
  • 12
  • 19
-1
const std::wstring Use::stow(const std::string &s)
{
    return std::wstring(s.begin(), s.end());
}

const std::string Use::wtos(const std::wstring &ws)
{
    return std::string(ws.begin(), ws.end());
}
l33t
  • 18,692
  • 16
  • 103
  • 180
  • 5
    Do not do this, this will do horrible things with strings that contain non-ascii characters and will likely yield invalid unicode strings. – John Bandela Dec 04 '13 at 13:32