38

I changed my class to use std::string (based on the answer I got here but a function I have returns wchar_t *. How do I convert it to std::string?

I tried this:

std::string test = args.OptionArg();

but it says error C2440: 'initializing' : cannot convert from 'wchar_t *' to 'std::basic_string<_Elem,_Traits,_Ax>'

Martin
  • 3,396
  • 5
  • 41
  • 67
codefrog
  • 611
  • 3
  • 10
  • 13

7 Answers7

53
std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );
somerandomdev49
  • 177
  • 1
  • 3
  • 11
Ulterior
  • 2,786
  • 3
  • 30
  • 58
  • 8
    Provides the actual answer to the question! – Ian Aug 30 '16 at 15:55
  • 2
    I like this solution for its simplicity. However, a little explanation couldn't hurt. It leaves open the question of how the characters are actually converted. Is there an information loss or are the wide characters converted to unicode? – Julian Feb 15 '17 at 12:22
  • 22
    I don't know why this answer got so many upvotes, what it does is equivalent to `char c = static_cast( wideChar )` for each character, so it obviously looses information if the wide-string characters are not in **ASCII range**. – zett42 May 21 '17 at 11:35
  • My hero! Thank you for directly providing the answer for the 99.9% of us. – daparic Apr 19 '20 at 02:41
  • @zett42 isn't that going to be true of any method to convert `wchar_t` to `std::string`, since by definition it's a lossy conversion... – j b Apr 29 '21 at 08:26
  • 4
    @jb Depends on the encoding of the `std::string`. E. g. when using UTF-8 there is no loss of information. – zett42 Apr 29 '21 at 14:50
10

You can convert a wide char string to an ASCII string using the following function:

#include <locale>
#include <sstream>
#include <string>

std::string ToNarrow( const wchar_t *s, char dfault = '?', 
                      const std::locale& loc = std::locale() )
{
  std::ostringstream stm;

  while( *s != L'\0' ) {
    stm << std::use_facet< std::ctype<wchar_t> >( loc ).narrow( *s++, dfault );
  }
  return stm.str();
}

Be aware that this will just replace any wide character for which an equivalent ASCII character doesn't exist with the dfault parameter; it doesn't convert from UTF-16 to UTF-8. If you want to convert to UTF-8 use a library such as ICU.

Praetorian
  • 106,671
  • 19
  • 240
  • 328
8

This is an old question, but if it's the case you're not really seeking conversions but rather using the TCHAR stuff from Mircosoft to be able to build both ASCII and Unicode, you could recall that std::string is really

typedef std::basic_string<char> string

So we could define our own typedef, say

#include <string>
namespace magic {
typedef std::basic_string<TCHAR> string;
}

Then you could use magic::string with TCHAR, LPCTSTR, and so forth

paulluap
  • 313
  • 4
  • 14
7

It's rather disappointing that none of the answers given to this old question addresses the problem of converting wide strings into UTF-8 strings, which is important in non-English environments.

Here's an example code that works and may be used as a hint to construct custom converters. It is based on an example code from Example code in cppreference.com.

#include <iostream>
#include <clocale>
#include <string>
#include <cstdlib>
#include <array>

std::string convert(const std::wstring& wstr)
{
    const int BUFF_SIZE = 7;
    if (MB_CUR_MAX >= BUFF_SIZE) throw std::invalid_argument("BUFF_SIZE too small");
    std::string result;
    bool shifts = std::wctomb(nullptr, 0);  // reset the conversion state
    for (const wchar_t wc : wstr)
    {
        std::array<char, BUFF_SIZE> buffer;
        const int ret = std::wctomb(buffer.data(), wc);
        if (ret < 0) throw std::invalid_argument("inconvertible wide characters in the current locale");
        buffer[ret] = '\0';  // make 'buffer' contain a C-style string
        result = result + std::string(buffer.data());
    }
    return result;
}

int main()
{
    auto loc = std::setlocale(LC_ALL, "en_US.utf8");  // UTF-8
    if (loc == nullptr) throw std::logic_error("failed to set locale");
    std::wstring wstr = L"aąß水-扫描-€\u00df\u6c34\U0001d10b";
    std::cout << convert(wstr) << "\n";
}

This prints, as expected:

program Printout

Explanation

  • 7 seems to be the minimal secure value of the buffer size, BUFF_SIZE. This includes 4 as the maximum number of UTF-8 bytes encoding a single character; 2 for the possible "shift sequence", 1 for the trailing '\0'.
  • MB_CUR_MAX is a run-time variable, so static_assert is not usable here
  • Each wide character is translated into its char representation using std::wctomb
  • This conversion makes sense only if the current locale allows multi-byte representations of a character
  • For this to work, the application needs to set the proper locale. en_US.utf8 seems to be sufficiently universal (available on most machines). In Linux, available locales can be queried in the console via locale -a command.

Critique of the most upvoted answer

The most upvoted answer,

std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );

works well only when the wide characters represent ASCII characters - but these are not what wide characters were designed for. In this solution, the converted string contains one char per each source wide char, ws.size() == test.size(). Thus, it loses information from the original wstring and produces strings that cannot be interpreted as proper UTF-8 sequences. For example, on my machine the string resulting from this simplistic conversion of "ĄŚĆII" prints as "ZII", even though its size is 5 (and should be 8).

zkoza
  • 2,644
  • 3
  • 16
  • 24
4

You could just use wstring and keep everything in Unicode

Steve Townsend
  • 53,498
  • 9
  • 91
  • 140
  • 2
    and I'll still get a const char* if I use .c_str()? I have other functions that expect const char* – codefrog Dec 02 '10 at 21:16
  • 1
    I'm going to make a guess that you are building your project in Unicode but really don't want that. If this is correct, you can change your project's properties to not build for Unicode and then you can use `string`. Check this in Project Properties, Configuration Properties, General, Character Set. You need this to say `Use Multibyte Character Set` to get rid of Unicode everywhere. – Steve Townsend Dec 02 '10 at 21:19
  • Originally I planned to use Unicode for some parts but then I decided I'll worry about that later. At this point I'm only bothered to get the program to work. I'm using SimpleINI and SimpleOpt to load options and it uses Unicode. I'm also using the SDK of another software which also uses Unicode. Disabling Unicode all together might make even those parts of the code stop working. – codefrog Dec 02 '10 at 21:22
  • SimpleIni docs indicate it uses the same conventions as Windows and so will work whichever way you build. For Unicode it uses a W suffix, for multi-byte charset it uses an A suffix, on function and class names. You should use the undecorated names (no A or W) and it will build in the right code depending on your project settings. – Steve Townsend Dec 02 '10 at 21:24
  • 3
    Since you're programming on Windows you probably should be using Unicode. The Windows API and NTFS natively support UTF-16, so building ASCII applications incur an aditional overhead where each function is doing string conversions for you. – Praetorian Dec 02 '10 at 21:24
  • @Praetorian - regardless of the correctness of that advice in the general case, path of least resistance is to use MBCS, since code is using `char*` elsewhere – Steve Townsend Dec 02 '10 at 21:30
  • @Steve: Yes, of course, I wasn't disputing that. If the OP doesn't have access to the source code that uses `char *` then he should convert the entire project to MBCS. – Praetorian Dec 02 '10 at 21:37
  • I'm gonna try using wstring and see how it goes. Thanks for the answers. – codefrog Dec 03 '10 at 05:40
  • 1
    Many applications use utf-8 internally. Windows is a right pain because wchar_t isnt big enough and it doesnt really support utf-8 properly. This makes life difficult when you have (like me) a large codebase application which uses utf-8 internally. Mostly this works fine but its the interaction with some of the OS level functions that become annoying. – Stephen Apr 18 '14 at 18:06
  • 41
    How is it an accepted answer if it doesn't even answer the question? – riv Aug 24 '15 at 17:00
3

just for fun :-):

const wchar_t* val = L"hello mfc";
std::string test((LPCTSTR)CString(val));
Danil
  • 701
  • 8
  • 7
3

Following code is more concise:

wchar_t wstr[500];
char string[500];
sprintf(string,"%ls",wstr);
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135