It's rather disappointing that none of the answers given to this old question addresses the problem of converting wide strings into UTF-8 strings, which is important in non-English environments.
Here's an example code that works and may be used as a hint to construct custom converters. It is based on an example code from Example code in cppreference.com.
#include <iostream>
#include <clocale>
#include <string>
#include <cstdlib>
#include <array>
std::string convert(const std::wstring& wstr)
{
const int BUFF_SIZE = 7;
if (MB_CUR_MAX >= BUFF_SIZE) throw std::invalid_argument("BUFF_SIZE too small");
std::string result;
bool shifts = std::wctomb(nullptr, 0); // reset the conversion state
for (const wchar_t wc : wstr)
{
std::array<char, BUFF_SIZE> buffer;
const int ret = std::wctomb(buffer.data(), wc);
if (ret < 0) throw std::invalid_argument("inconvertible wide characters in the current locale");
buffer[ret] = '\0'; // make 'buffer' contain a C-style string
result = result + std::string(buffer.data());
}
return result;
}
int main()
{
auto loc = std::setlocale(LC_ALL, "en_US.utf8"); // UTF-8
if (loc == nullptr) throw std::logic_error("failed to set locale");
std::wstring wstr = L"aąß水-扫描-€\u00df\u6c34\U0001d10b";
std::cout << convert(wstr) << "\n";
}
This prints, as expected:

Explanation
- 7 seems to be the minimal secure value of the buffer size,
BUFF_SIZE
. This includes 4 as the maximum number of UTF-8 bytes encoding a single character; 2 for the possible "shift sequence", 1 for the trailing '\0'
.
MB_CUR_MAX
is a run-time variable, so static_assert
is not usable here
- Each wide character is translated into its
char
representation using std::wctomb
- This conversion makes sense only if the current locale allows multi-byte representations of a character
- For this to work, the application needs to set the proper locale.
en_US.utf8
seems to be sufficiently universal (available on most machines). In Linux, available locales can be queried in the console via locale -a
command.
Critique of the most upvoted answer
The most upvoted answer,
std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );
works well only when the wide characters represent ASCII characters - but these are not what wide characters were designed for. In this solution, the converted string contains one char per each source wide char, ws.size() == test.size()
. Thus, it loses information from the original wstring and produces strings that cannot be interpreted as proper UTF-8 sequences. For example, on my machine the string resulting from this simplistic conversion of "ĄŚĆII" prints as "ZII", even though its size is 5 (and should be 8).