The std::stringstream
can always be converted to std::string
, so the question reduces to how to convert std::string
to std::wstring
.
If the narrow string encoding points are a subset of the wide string encoding points, then you can simply copy the data over:
const std::string s = ...;
const std::wstring ws( s.begin(), s.end() );
This works for original ASCII and its extension Latin-1 when the wide strings are UTF-16 or UTF-32 encoded.
In practice that means that this simple data copying scheme works for:
Latin-1 in a Western installation of Windows, because Latin-1 is a subset of Windows ANSI Western.
ASCII in other Windows installations and in Unix-land, because the default system narrow encoding is (typically) not an extension of Latin-1.
When the narrow string encoding points are not a subset of the wide string encoding points, some more active conversion must be employed.
The following works when the std::string
's encoding is the locale's narrow text encoding, and doesn't contain embedded zero bytes:
#include <iostream>
#include <locale> // std::locale
#include <locale.h> // setlocale
#include <stdexcept> // std::runtime_error
#include <stdlib.h> // mbstowcs
#include <string>
using namespace std;
auto hopefully( const bool condition ) -> bool { return condition; }
auto fail( const string& message ) -> bool { throw runtime_error( message ); }
auto widened( const string& s, locale const& loc = locale() )
-> wstring
{
const int n = s.length();
if( n == 0 ) { return L""; }
const int max_wide_encoding_values = (sizeof( wchar_t ) == 2? 2 : 1);
wstring ws( max_wide_encoding_values*s.length(), L'\0' );
const auto n_characters_stored = mbstowcs( &ws[0], &s[0], ws.size() );
hopefully( n_characters_stored != -1 )
|| fail( "mbstowcs failed" );
ws.resize( n_characters_stored );
return ws;
}
auto operator<<( wostream& stream, const string& s )
-> wostream&
{ return stream << s.c_str(); }
auto main() -> int
{
setlocale( LC_ALL, "" );
locale::global( locale( "" ) );
const wstring ws = widened( "Blåbærsyltetøy." );
for( const wchar_t wc : ws )
{
wcout << int( wc ) << ' ';
}
wcout << endl;
wcout << L"Should be 'Blåbærsyltetøy'." << endl;
wcout << L"Is '" << ws << L"'." << endl;
}
In Ubuntu (in a VirtualBox in Windows) the output is OK:
alf@devubuntu32:~/host/dev/explore/_/so/0244$ g++ foo.cpp -std=c++11
alf@devubuntu32:~/host/dev/explore/_/so/0244$ ./a.out
66 108 229 98 230 114 115 121 108 116 101 116 248 121 46
Should be 'Blåbærsyltetøy'.
Is 'Blåbærsyltetøy.'.
alf@devubuntu32:~/host/dev/explore/_/so/0244$ ▯
In Windows it's 1necessary to add some fixup to make the wide stream output work:
#include <io.h>
#include <fcntl.h>
#include <stdio.h>
static const bool _ = []() -> bool
{
const int fd = _fileno( stdout );
_setmode( fd, _isatty( fd )? _O_WTEXT : _O_U8TEXT );
return true;
}();
Then with Visual C++ in Windows the output is
H:\dev\explore\_\so\0244>cl iofix.cpp foo.cpp /Feb
iofix.cpp
foo.cpp
Generating Code...
H:\dev\explore\_\so\0244>b
66 108 229 98 230 114 115 121 108 116 101 116 248 121 46
Should be 'Blåbærsyltetøy'.
Is 'Blåbærsyltetøy.'.
H:\dev\explore\_\so\0244>_
However, with MinGW g++ in Windows the default output is not OK:
H:\dev\explore\_\so\0244>g++ iofix.cpp foo.cpp
H:\dev\explore\_\so\0244>a
66 108 195 165 98 195 166 114 115 121 108 116 101 116 195 184 121 46
Should be 'Blåbærsyltetøy'.
Is 'Blåbærsyltetøy.'.
H:\dev\explore\_\so\0244>_
And the reason is that the default g++ C++ execution character set is UTF-8, which is not the narrow text encoding specified by the default user's locale in Windows. A simple fix would be to specify the correct execution character set to g++. However, that's only practically possible with distributions of g++ that support those options, and e.g. the Nuwen distribution does not.
1) in Unix-land it worked as-is because the global C++ locale has been set to the user's default locale.