Convert const char* to wstring

Question

I'm working on a native extension for a zinc based flash application and I need to convert a const char* to a wstring.

This is my code:

mdmVariant_t* appendHexDataToFile(const zinc4CallInfo_t *pCallInfo, int paramCount, mdmVariant_t **params) {

    if(paramCount >= 2) {
        const char *file    = mdmVariantGetString(params[0]);
        const char *data    = mdmVariantGetString(params[1]);

        return mdmVariantNewInt(native.AppendHexDataToFile(file, data));
    }
    else {
        return mdmVariantNewBoolean(FALSE);
    }
}

But native.AppendHexDataToFile() needs two wstring. I'm not very good with C++ and I think all those different string types are totally confusing and I didn't find something useful in the net. So I'm asking you guys how to do it.

Edit: The Strings are UTF-8 and I'm using OSX and Windows XP/Vista/7

Before you try to deal with chars and wide chars, you should be able to answer the following question: **How are you strings encoded** and what conversion do you intend to do ? — ereOn, May 24 '12 at 12:38

score 25 · Accepted Answer · edited Aug 16 '18 at 15:57

25

I recommend you using std::string instead of C-style strings (char*) wherever possible. You can create std::string object from const char* by simple passing it to its constructor.

Once you have std::string, you can create simple function that will convert std::string containing multi-byte UTF-8 characters to std::wstring containing UTF-16 encoded points (16bit representation of special characters from std::string).

There are more ways how to do that, here's the way by using MultiByteToWideChar function:

std::wstring s2ws(const std::string& str)
{
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
    std::wstring wstrTo( size_needed, 0 );
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
    return wstrTo;
}

Check these questions too:
Mapping multibyte characters to their unicode point representation
Why use MultiByteToWideCharArray to convert std::string to std::wstring?

edited Aug 16 '18 at 15:57

Jossef Harush Kadouri

32,361
10
130
129

answered May 24 '12 at 13:09

LihO

41,190
11
99
167

13

**Disclaimer:** `MultiByteToWideChar` is a Windows-only function. (OP is using Windows but question is tagged just `c++`) – Lightness Races in Orbit Jan 28 '13 at 10:25
1

It would be best if your solution is cross platform. – huahsin68 Jun 14 '16 at 14:08
@huahsin68 This is still the best conversion on Windows. As MS says [here](https://msdn.microsoft.com/en-us/library/windows/desktop/dd317752(v=vs.85).aspx), _"Your application can convert between Windows code pages and OEM code pages using the standard C runtime library functions. However, use of these functions presents a risk of data loss because the characters that can be represented by each code page do not match exactly"_ However, I wonder if one should use the CP_ACP codepage when converting strings obtained through WINAPI ...A() functions (for example, from GetWindowTextA) – gog Jan 09 '18 at 13:27

score 21 · Answer 2 · answered Sep 15 '16 at 21:35

21

You can convert char string to wstring directly as following code:

char buf1[] = "12345678901234567890";
wstring ws(&buf1[0], &buf1[20]);

answered Sep 15 '16 at 21:35

SaeidMo7

1,214
15
22

1

Or more generic: `wstring ws(buf1, buf1 + strlen(buf1));` if you don't know buf1's length at design time. – Jac Goudsmit Feb 03 '23 at 07:54
Note that it doesn't solve the UTF-8 problematic mentioned in the question. But for a known ascii <128 string (if it's your prerequisite), it works just fine. – Sandburg Jul 12 '23 at 08:35

anhoppe · Answer 3 · 2022-01-03T07:03:15.360

16

AFAIK this works only from C++11 and above:

#include <codecvt>

// ...

std::wstring stringToWstring(const std::string& t_str)
{
    //setup converter
    typedef std::codecvt_utf8<wchar_t> convert_type;
    std::wstring_convert<convert_type, wchar_t> converter;

    //use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
    return converter.from_bytes(t_str);
}

Reference answer

Update

As indicated in the comments, <codecvt> seems to be deprecated in C++17. See here: Deprecated header <codecvt> replacement

edited Jan 03 '22 at 07:03

answered Sep 21 '15 at 14:00

anhoppe

4,287
3
46
58

5

Deprecated in C++17 – Enrico Detoma Mar 28 '19 at 09:03
@EnricoDetoma what exactly is deprecated? – Apr 09 '19 at 09:36
std::wstring_convert is not available without including #include at least in my Ubuntu box – Melardev Aug 07 '19 at 10:54

score 2 · Answer 4 · edited Oct 19 '15 at 07:15

2

You need a library that can encode/decode UTF8. Unfortunately, this functionality isn't included with the std c++ library. Here's one library you might use: http://utfcpp.sourceforge.net/

Here's an example use of it:

utf8::utf8to32(bytes.begin(), bytes.end(), std::back_inserter(wstr));

edited Oct 19 '15 at 07:15

m.s.

16,063
7
53
88

answered May 24 '12 at 13:14

Edward Loper

15,374
7
43
52

score 0 · Answer 5 · answered May 24 '12 at 14:12

On OS X wstring uses UTF-32 rather than UTF-16. You can do the conversion like this:

#include <codecvt>
#include <string>

// make facets usable by giving them a public destructor
template <class Facet>
class usable_facet
    : public Facet
{
public:
    template <class ...Args>
        usable_facet(Args&& ...args)
            : Facet(std::forward<Args>(args)...) {}
    ~usable_facet() {}
};

std::wstring s2ws(std::string const &s) {
    std::wstring_convert<
        usable_facet<std::codecvt<char32_t,char,std::mbstate_t>>
        ,char32_t> convert;
    std::u32string utf32 = convert.from_bytes(s);
    static_assert(sizeof(wchar_t)==sizeof(char32_t),"char32_t and wchar_t must have same size");
    return {begin(utf32),end(utf32)};
}

Violet Giraffe · Answer 6 · 2019-08-07T14:46:38.760

An addition to the answer from @anhoppe. Here's how to convert char*:

#include <codecvt>
#include <locale> 

// ...

std::wstring stringToWstring(const char* utf8Bytes)
{
    //setup converter
    using convert_type = std::codecvt_utf8<typename std::wstring::value_type>;
    std::wstring_convert<convert_type, typename std::wstring::value_type> converter;

    //use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
    return converter.from_bytes(utf8Bytes);
}

And here's how to convert char* if you also already know the length of the buffer:

#include <codecvt>

// ...

std::wstring stringToWstring(const char* utf8Bytes, const size_t numBytes)
{
    //setup converter
    using convert_type = std::codecvt_utf8<typename std::wstring::value_type>;
    std::wstring_convert<convert_type, typename std::wstring::value_type> converter;

    //use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
    return converter.from_bytes(utf8Bytes, utf8Bytes + numBytes);
}

std::wstring_convert is not available without including #include at least in my Ubuntu box — Melardev, Aug 07 '19 at 10:54

Samy Arous · Answer 7 · 2012-05-24T12:55:01.287

-2

Here's a code I found;

std::wstring StringToWString(const std::string& s)
 {
 std::wstring temp(s.length(),L' ');
 std::copy(s.begin(), s.end(), temp.begin());
 return temp; 
 }

And here's the original forum post with a possible second solution using the windows API function MultiByteToWideChar:

http://forums.codeguru.com/archive/index.php/t-193852.html

edited May 24 '12 at 12:55

answered May 24 '12 at 12:43

Samy Arous

6,794
13
20

3

What if std::string passed to this function contains multi-byte characters that need to be converted to UTF-16 encoded wide chars equivalents? – LihO May 24 '12 at 12:50
Why not ' '? it will be erased anyway by the copy function. It's just an arbitrary char to make room for the string. – Samy Arous May 24 '12 at 12:53
Then why would you initialize these characters to `' '` when you know that they are going to be rewritten? – LihO May 24 '12 at 12:55
because the copy function does not create a buffer. The buffer needs to be created first, which is done by the constructor. – Samy Arous May 24 '12 at 12:59
Ah, sorry I see now. I thought that [basic_string](http://en.cppreference.com/w/cpp/string/basic_string/basic_string) has also constructor that takes only `size_type count`, just like `std::vector` has. – LihO May 24 '12 at 13:03
This will not convert UTF-8 encoded characters outside the ASCII range to whatever encoding `std::wstring` uses. It simply copies, truncates, and destroys data. This is not a solution. – IInspectable Aug 04 '15 at 07:23

Convert const char* to wstring

7 Answers7

Linked

Related