224
string s = "おはよう";
wstring ws = FUNCTION(s, ws);

How would i assign the contents of s to ws?

Searched google and used some techniques but they can't assign the exact content. The content is distorted.

kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
Samir
  • 3,923
  • 9
  • 36
  • 43
  • 8
    I don't think `strings` accepts >8-bit characters. Is it already encoded in UTF-8? – kennytm Apr 04 '10 at 07:36
  • 4
    What's your system encoding that it would make `"おはよう"` a system-encoded string? – sbi Apr 04 '10 at 07:42
  • I believe MSVC will accept that and make it some multibyte encoding, maybe UTF-8. – Potatoswatter Apr 04 '10 at 07:47
  • There is no problem with string s = "おはよう" in debug visual studio, i checked after that assignment s = "おはよう" But I'm not much familiar with system-encoding thing...how/where to check? – Samir Apr 04 '10 at 08:26
  • 1
    @Potatoswatter: MSVC doesn't use UTF-8 by default for ANYTHING. If you enter those characters, it asks which encoding to convert the file to, and defaults to codepage 1252. – Mooing Duck Sep 03 '13 at 16:58
  • 3
    @Samir: more important is what is the encoding of the _file_? Can you move that string to the beginning of the file and show a hexdump of that part? We can probably identify it from that. – Mooing Duck Sep 03 '13 at 16:59

20 Answers20

294

Assuming that the input string in your example (おはよう) is a UTF-8 encoded (which it isn't, by the looks of it, but let's assume it is for the sake of this explanation :-)) representation of a Unicode string of your interest, then your problem can be fully solved with the standard library (C++11 and newer) alone.

The TL;DR version:

#include <locale>
#include <codecvt>
#include <string>

std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
std::string narrow = converter.to_bytes(wide_utf16_source_string);
std::wstring wide = converter.from_bytes(narrow_utf8_source_string);

Longer online compilable and runnable example:

(They all show the same example. There are just many for redundancy...)

Note (old):

As pointed out in the comments and explained in https://stackoverflow.com/a/17106065/6345 there are cases when using the standard library to convert between UTF-8 and UTF-16 might give unexpected differences in the results on different platforms. For a better conversion, consider std::codecvt_utf8 as described on http://en.cppreference.com/w/cpp/locale/codecvt_utf8

Note (new):

Since the codecvt header is deprecated in C++17, some worry about the solution presented in this answer were raised. However, the C++ standards committee added an important statement in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0618r0.html saying

this library component should be retired to Annex D, along side , until a suitable replacement is standardized.

So in the foreseeable future, the codecvt solution in this answer is safe and portable.

Johann Gerell
  • 24,991
  • 10
  • 72
  • 122
  • Thank you! I do like the explicit references to utf8 and utf16 encodings here. And for using the standard library alone: I wish I could upvote to make this the highest ranking answer all by myself. – DLRdave Oct 11 '13 at 20:42
  • Additionally, if you want to make sure that your inisital string has the right encoding type, you should be using a utf8-encoded string literal to initialize it, like follows: string s = u8"おはよう"; – Martin J. Oct 12 '13 at 13:08
  • I dont understand why, but this works..If I Type the character 0x00c3, with is L'Ã' (not sure if you will see it, its an A with a tilde accent) in VS, everything is fine. If I load that character from notepad++ (encoded with utf-8), it loads as 2 chars: 0xc3 and 0x83. Why the hell it comes as 2 chars? So Using the wstring ctor to convert generates 0xffc3 and 0xff83, that shows as 2 japanese like kanjis. Using your method, it converts the 2 chars to 0x00c3 !! =) I have no clue whats happening, but it works..(that stuff is really confuse) – Icebone1000 Nov 07 '13 at 21:56
  • 2
    Check what encoding you save VS files with – Johann Gerell Nov 08 '13 at 10:39
  • 9
    Be aware that this is C++11-only! – bk138 Jan 15 '14 at 13:58
  • 1
    In minGW (gcc/g++ 4.8.1 and -std=c++11 ) the codecvt header does not exist. Is there an alternative? – Brian Jack Dec 11 '14 at 19:34
  • This makes an unportable assumption that wchar_t is 16 bit. The resulting wstring would be very surprising outside Windows. – Cubbi Aug 26 '15 at 15:35
  • @Cubbi - please clarify where that assumption is made, thanks! – Johann Gerell Aug 27 '15 at 07:31
  • @JohannGerell give it a char string `\xF0\x9F\x8D\x8C` and it will produce two wchar_ts, one holding `0xD83C` and the other holding `0xDF4C`. While it makes some (albeit perverse) sense to put UTF-16 into wstrings on Windows, everyone else would expect a single wchar_t with the value `0x1F34C `. That's what `std::codedvt_utf8` would produce. – Cubbi Aug 27 '15 at 10:20
  • @Cubbi: _"everyone else would expect a single wchar_t with the value 0x1F34C"_ - anyone can see it ()'s a banana ;) – Johann Gerell Aug 27 '15 at 11:56
  • @Cubbi: So, to be clear, for any readers of this answer and your comment, when you say _"This makes an unportable assumption that wchar_t is 16 bit"_ by _"This"_ you actually mean that the library call `converter.from_bytes` above might return a 2-character `wchar_t` string instead of a 1-character `wchar_t` string, because the library call assumes `wchar_t` is 16 bit on all platforms? – Johann Gerell Aug 27 '15 at 12:11
  • 1
    The library call does what it's supposed to do, produces a utf-16 encoded string and stores it in whatever target type was given (even if it's mismatched, as when this runs on coliru's linux). The assumption is in choosing codecvt_utf8_utf16 instead of codecvt_utf8. – Cubbi Aug 27 '15 at 12:21
  • 1
    Added a **Note** section to the answer mentioning the deficiency pointed out by @Cubbi. – Johann Gerell Aug 27 '15 at 12:21
  • Works on Linux, but after I changed codecvt_utf8_utf16 to codecvt_utf8. Then string -> wstring conversion works just fine! – Viktor Nov 08 '15 at 16:55
  • 1
    May you please provide an example of `std::codecvt_utf8` for beginners – Noitidart Feb 23 '17 at 00:22
  • 34
    Please note that `` is deprecated since C++17. – tambre Apr 09 '17 at 11:01
  • 2
    @tambre - thanks for pointing this out, I added the **Note (new)** paragraph to address this. – Johann Gerell Jul 28 '17 at 07:32
  • 4
    If I were the sole and all powerful ruler of this world, I would decree, that UTF16 is outlawed and only UTF8 and UTF32 are legal and usable without the danger of severe punishment. ;) I mean seriously - what is UTF16 good for if it still is multi-code point?! All this hassle with conversions has the root cause that UTF16 is flawed, IMHO. – BitTickler Mar 07 '20 at 20:23
  • 1
    Note that in MSVC, the codecvt_utf8 fails to convert non-BMP codepoints properly, like U+10CFA. Passing an UTF-8 text with such codepoints would not result in proper UTF-16 surrogate pair (like `03 D8 FA DC` for UTF-16-LE), but result in a single UTF-16 unit in the range U+0C80 .. U+0CFF. MultiByteToWideChar is more robust for those cases. – Mike Kaganski Jun 18 '20 at 13:30
  • Thats the theory but its not portable because g++ doesn't implement codecvt unfortunately. I think it would be easier to get the entire world to only use the expanded latin alphabet for text, tan it would be for the computer science standards and compiler folks to come up with one simple solution to handle wide strings and multibyte encodings. – Minok Jun 16 '21 at 21:25
  • "If I were the sole and all powerful ruler of this world..." I'm glad you aren't, just because we have a nauseating surplus of orthodox autocratic leaders already. ;) (OTOH, now that even Microsoft prefers UTF-8 at long last, UTF-16 is fading into oblivion, and that's OK. It still does have some valid use cases, in certain well-defined problem spaces with a closed set of text inputs, I guess. I wouldn't punish those working in those niches, that's enough punishment for them already, I suppose. :) ) – Sz. Jan 07 '22 at 15:18
  • @Sz. A number of widely used file system use UTF-16. Also backward compatibility, you'd make a great amount of documentation and data insccessible by phasing out utf-16 (utf-8 clearly not enough to represent international tect) – Swift - Friday Pie Jun 20 '23 at 12:15
  • @Swift-FridayPie: "utf-8 clearly not enough to represent international tect" - what do you mean? – Johann Gerell Jun 21 '23 at 08:07
  • @JohannGerell its main focus is representation. It's more efficient for representing or inspecting long texts or large amount of data than UTF-8. Yeah, big computer systems are now powerful enough, be we have classes of small scale and realtime platforms encrouching into everyday applications. – Swift - Friday Pie Jun 21 '23 at 16:42
  • @Swift-FridayPie: " *its* main focus is representation" - what is "*its*" in that claim? I asked about your claim that "utf-8 clearly not enough to represent international tect", so I wonder what *its* is in your new claim. Sorry, can't really follow what you mean – Johann Gerell Jun 22 '23 at 13:06
  • @Swift-FridayPie I think you meant to reply to BitTickler's comment, and misunderstood mine, as I agree with your point ("...widely used file systems use UTF-16. Also backward compatibility..."), and that's exactly why I said that "[UTF-16] still does have some valid use cases", and "I wouldn't punish those". – Sz. Jun 25 '23 at 15:26
63
int StringToWString(std::wstring &ws, const std::string &s)
{
    std::wstring wsTmp(s.begin(), s.end());

    ws = wsTmp;

    return 0;
}
Pietro M
  • 1,905
  • 3
  • 20
  • 24
  • Very interesting constructor. Also note that it can be used to convert wstring to string in the same way. – jyz Jul 30 '13 at 01:45
  • 118
    This only works if all the characters are single byte, i.e. ASCII or [ISO-8859-1](http://en.wikipedia.org/wiki/ISO-8859-1). Anything multi-byte will fail miserably, including UTF-8. The question clearly contains multi-byte characters. – Mark Ransom Sep 03 '13 at 16:22
  • 31
    This answer is clearly insufficient and does nothing but copy narrow characters as is into wide characters. See the other answers, particularly the one by Johann Gerell, for how to properly go from a multi-byte or utf8 encoded string to a utf16 wstring. – DLRdave Oct 13 '13 at 11:29
  • 11
    this answer is dangerous and will probably break on non-ascii system. i.e. an arabic filename will get mangled by this hack. – Stephen Apr 18 '14 at 19:50
  • 1
    unfortunately mingw doesn't have the header to do it the 'right' way so only this 'incorrect' way is possible. :( – Brian Jack Dec 11 '14 at 19:51
  • @BrianJack: The more correct way to do it then is to use a 3rd party lib or function. – Johann Gerell Dec 19 '14 at 13:31
  • 14
    This answer is useful if you ignore the nuance of the question's body and focus on the question title, which is what brought me here from Google. As is, the question's title is *extremely* misleading and should be altered to reflect the true question being asked – Anne Quinn Dec 17 '15 at 07:37
  • This answer is wrong for many reasons. As pointed out by others, this will work for a very limited subset of character encodings in the source string. The character encoding used by the question is not part of that subset. It's also completely unclear to me, why this function returns a useless integer value in place of constructed wide character string. Plus, throwing code at the reader without providing an explanation on what it does and why it works is not very useful. – IInspectable Jan 12 '16 at 00:47
  • 3
    This works only for 7-bit ASCII characters. For latin1, it works only if char is configured as unsigned. If the type char is signed (which is most of time the case), characters > 127 will give wrong results. – huyc May 16 '16 at 18:32
  • 2
    If you want *this kind of* conversion to work **reliably**, do `std::wstring c(const std::string& i){size_t n =i.length();std::wstring o;o.reserve(n);for(size_t p=0;p – Matthias Ronge Aug 17 '16 at 10:17
41

Your question is underspecified. Strictly, that example is a syntax error. However, std::mbstowcs is probably what you're looking for.

It is a C-library function and operates on buffers, but here's an easy-to-use idiom, courtesy of Mooing Duck:

std::wstring ws(s.size(), L' '); // Overestimate number of code points.
ws.resize(std::mbstowcs(&ws[0], s.c_str(), s.size())); // Shrink to fit.
tom
  • 21,844
  • 6
  • 43
  • 36
Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • 1
    string s = "おはよう"; wchar_t* buf = new wchar_t[ s.size() ]; size_t num_chars = mbstowcs( buf, s.c_str(), s.size() ); wstring ws( buf, num_chars ); // ws = distorted – Samir Apr 04 '10 at 08:23
  • 1
    @Samir: You have to make sure the runtime encoding is the same as the compile-time encoding. You might need to `setlocale` or adjust compiler flags. I don't know because I don't use Windows, but this is why it's not a common feature. Consider the other answer if possible. – Potatoswatter Apr 04 '10 at 09:30
  • 2
    `std::string ws(s.size()); ws.resize(mbstowcs(&ws[0], s.c_str(), s.size());` RAII FTW – Mooing Duck Sep 03 '13 at 17:01
  • Also... it wouldn't compile for me as is: I had to make it "std::wstring ws(s.size(), 0);" to get it to compile with Visual Studio 2012. In the end, I opted to go with Johann Gerell's answer using "std::codecvt_utf8_utf16" anyhow. Thanks. – DLRdave Oct 12 '13 at 13:33
  • @DLRdave OK. This is an interesting page. It's been getting high traffic consistently for years and the answers reflect different time periods. Support for those `std::codecvt_*_*` classes is fairly new; they didn't exist when I wrote this and I haven't yet verified that they work in GCC. The highest voted answer is clearly incorrect so at least you didn't go that way :v) . – Potatoswatter Oct 13 '13 at 00:30
  • Writing over the wstring's internal buffer (accessed via &ws[0] above) potentially may break the string interface by relying on a contiguous implementation. For example see this SO answer http://stackoverflow.com/a/1043318/406859 – WaffleSouffle Sep 22 '14 at 15:50
  • 2
    @WaffleSouffle That's out of date. Contiguous implementations have been required since 2011 and implementations quit such tricks long before that. – Potatoswatter Sep 22 '14 at 23:53
  • 1
    and some environments like mingw still don't have the codecvt header so some of the 'better' solutions earlier up don't work meaning this problem still has no good solutions in mingw even as of Dec 2014 – Brian Jack Dec 11 '14 at 19:54
  • @Potatoswatter `error C4996: 'mbstowcs': This function or variable may be unsafe. Consider using mbstowcs_s instead.` Maybe you should update something? – Simple Jan 28 '21 at 08:02
  • @Simple The `_s` or safe family of functions take an extra size parameter. Updated. – Potatoswatter Jan 28 '21 at 08:38
  • @Potatoswatter I included the `stdlib.h` but it tells me: `namespace "std" has no member "mbstowcs_s"` why? – Simple Jan 28 '21 at 10:29
  • @Simple Well that’s sad. C++ hasn’t gotten around to adopting the newer C function even as the old one generates warnings. I updated the answer again. – Potatoswatter Jan 28 '21 at 14:44
  • @Potatoswatter Shouldn't you delete your answer then? I mean your initial answer was C++ but the compiler error forced you to update it but then it turned to be a C answer however the question is C++. – Simple Jan 29 '21 at 03:32
  • @Simple There’s nothing wrong with calling `mbstowcs_s` from C++, and this shows how to use it properly with standard strings. – Potatoswatter Jan 29 '21 at 17:26
  • @Simple: The function `mbstowcs` can be hard to use safely, but the way it was used in this answer ("overestimate number of code points") is perfectly safe. Microsoft decided to aggressively discourage people from using such functions (Visual Studio treats them as errors by default). But as their [documentation says](https://learn.microsoft.com/en-us/cpp/c-runtime-library/security-features-in-the-crt?view=msvc-170), they aren't planning to remove the functions – they are just recommending not to use them. [...] – tom Jun 12 '22 at 06:33
  • Unfortunately, in many cases the secure alternatives are not widely supported yet. My suggestions are: understand and be aware of the dangers; try to use the more secure versions if possible; or else disable Microsoft's warnings by inserting `#define _CRT_SECURE_NO_WARNINGS` at the top of the file before the #include's, or by compiling with `/D_CRT_SECURE_NO_WARNINGS`. – tom Jun 12 '22 at 06:34
25

If you are using Windows/Visual Studio and need to convert a string to wstring you could use:

#include <AtlBase.h>
#include <atlconv.h>
...
string s = "some string";
CA2W ca2w(s.c_str());
wstring w = ca2w;
printf("%s = %ls", s.c_str(), w.c_str());

Same procedure for converting a wstring to string (sometimes you will need to specify a codepage):

#include <AtlBase.h>
#include <atlconv.h>
...
wstring w = L"some wstring";
CW2A cw2a(w.c_str());
string s = cw2a;
printf("%s = %ls", s.c_str(), w.c_str());

You could specify a codepage and even UTF8 (that's pretty nice when working with JNI/Java). A standard way of converting a std::wstring to utf8 std::string is showed in this answer.

// 
// using ATL
CA2W ca2w(str, CP_UTF8);

// 
// or the standard way taken from the answer above
#include <codecvt>
#include <string>

// convert UTF-8 string to wstring
std::wstring utf8_to_wstring (const std::string& str) {
    std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
    return myconv.from_bytes(str);
}

// convert wstring to UTF-8 string
std::string wstring_to_utf8 (const std::wstring& str) {
    std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
    return myconv.to_bytes(str);
}

If you want to know more about codepages there is an interesting article on Joel on Software: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.

These CA2W (Convert Ansi to Wide=unicode) macros are part of ATL and MFC String Conversion Macros, samples included.

Sometimes you will need to disable the security warning #4995', I don't know of other workaround (to me it happen when I compiled for WindowsXp in VS2012).

#pragma warning(push)
#pragma warning(disable: 4995)
#include <AtlBase.h>
#include <atlconv.h>
#pragma warning(pop)

Edit: Well, according to this article the article by Joel appears to be: "while entertaining, it is pretty light on actual technical details". Article: What Every Programmer Absolutely, Positively Needs To Know About Encoding And Character Sets To Work With Text.

lmiguelmh
  • 3,074
  • 1
  • 37
  • 53
  • Probably the fact that it promotes non-portable code. – Pavel Minaev Aug 24 '15 at 23:11
  • Yes, that's why I stated that this works only in Windows/Visual Studio. But at least this solution is correct, and not this one: `char* str = "hello worlddd"; wstring wstr (str, str+strlen(str));` – lmiguelmh Aug 25 '15 at 23:07
  • Additional note: CA2W is under namespace of ATL. (ATL::CA2W) – Val Mar 22 '16 at 08:55
23

Windows API only, pre C++11 implementation, in case someone needs it:

#include <stdexcept>
#include <vector>
#include <windows.h>

using std::runtime_error;
using std::string;
using std::vector;
using std::wstring;

wstring utf8toUtf16(const string & str)
{
   if (str.empty())
      return wstring();

   size_t charsNeeded = ::MultiByteToWideChar(CP_UTF8, 0, 
      str.data(), (int)str.size(), NULL, 0);
   if (charsNeeded == 0)
      throw runtime_error("Failed converting UTF-8 string to UTF-16");

   vector<wchar_t> buffer(charsNeeded);
   int charsConverted = ::MultiByteToWideChar(CP_UTF8, 0, 
      str.data(), (int)str.size(), &buffer[0], buffer.size());
   if (charsConverted == 0)
      throw runtime_error("Failed converting UTF-8 string to UTF-16");

   return wstring(&buffer[0], charsConverted);
}
Alex Che
  • 6,659
  • 4
  • 44
  • 53
  • You can optimize it. There's no need to do double copy of the string by using a `vector`. Simply reserve the characters in the string by doing `wstring strW(charsNeeded + 1);` and then use it as buffer for conversion: `&strW[0]`. Lastly ensure last null is present after conversion by doing `strW[charsNeeded] = 0;` – c00000fd Feb 06 '17 at 03:35
  • 3
    @c00000fd, as far as I know, the std::basic_string internal buffer is required to be continuous only since C++11 standard. My code is pre C++11, as noted on the top of the post. Therefore, the &strW[0] code would be not standard compliant and might legitimately crash at runtime. – Alex Che Feb 06 '17 at 07:03
22

Here's a way to combining string, wstring and mixed string constants to wstring. Use the wstringstream class.

This does NOT work for multi-byte character encodings. This is just a dumb way of throwing away type safety and expanding 7 bit characters from std::string into the lower 7 bits of each character of std:wstring. This is only useful if you have a 7-bit ASCII strings and you need to call an API that requires wide strings.

#include <sstream>

std::string narrow = "narrow";
std::wstring wide = L"wide";

std::wstringstream cls;
cls << " abc " << narrow.c_str() << L" def " << wide.c_str();
std::wstring total= cls.str();
Mark Lakata
  • 19,989
  • 5
  • 106
  • 123
  • The answer seems interesting. Could you please explain a bit: will this work for multi-byte encodings, and why/how? – wh1t3cat1k Nov 14 '15 at 08:23
  • 1
    encoding schemes are orthogonal to the storage class. `string` stores 1 byte characters and `wstring` stores 2 byte characters. something like utf8 stores mulitbyte characters as a series of 1 byte values, i.e. in a `string`. the string classes don't help with encoding. I'm not an expert on encoding classes in c++. – Mark Lakata Nov 14 '15 at 16:40
  • 2
    Any reason why this one is not the best answer, given how short and simple it is? Any cases that it does not cover? – Ryuu May 04 '18 at 09:56
  • @MarkLakata, I read your answer to the first comment but am still not sure. Will it work for multi-byte characters? In other words, is it not prone to the same pitfall as [this answer](https://stackoverflow.com/a/8969776/3258851)? – Marc.2377 Sep 10 '19 at 06:42
  • 1
    @Marc.2377 This does NOT work for multi-byte character encodings. This is just a dumb way of throwing away type safety and expanding 7 bit characters from `std::string` into the lower 7 bits of each character of `std:wstring`. This is only useful if you have a 7-bit ASCII strings and you need to call an API that requires wide strings. Look at https://stackoverflow.com/a/8969776/3258851 if you need something more sophisticated. – Mark Lakata Sep 10 '19 at 22:42
  • @MarkLakata, interestingly, I just tested both your answer and the one you linked to. Your answer does work, the other one does not. I used `"おはよう"` as a test string. – Marc.2377 Sep 10 '19 at 23:48
19

From char* to wstring:

char* str = "hello worlddd";
wstring wstr (str, str+strlen(str));

From string to wstring:

string str = "hello worlddd";
wstring wstr (str.begin(), str.end());

Note this only works well if the string being converted contains only ASCII characters.

rubenvb
  • 74,642
  • 33
  • 187
  • 332
Ghominejad
  • 1,572
  • 16
  • 15
12

This variant of it is my favourite in real life. It converts the input, if it is valid UTF-8, to the respective wstring. If the input is corrupted, the wstring is constructed out of the single bytes. This is extremely helpful if you cannot really be sure about the quality of your input data.

std::wstring convert(const std::string& input)
{
    try
    {
        std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
        return converter.from_bytes(input);
    }
    catch(std::range_error& e)
    {
        size_t length = input.length();
        std::wstring result;
        result.reserve(length);
        for(size_t i = 0; i < length; i++)
        {
            result.push_back(input[i] & 0xFF);
        }
        return result;
    }
}
Matthias Ronge
  • 9,403
  • 7
  • 47
  • 63
  • 2
    I just launched this question based on your answer https://stackoverflow.com/questions/49669048/c-converting-to-a-wstring-why-would-a-character-be-anded-to-a-byte#49669048 can you kindly take a look – MistyD Apr 05 '18 at 09:51
  • This really works for me :) – Ivan Cachicatari Jul 05 '23 at 03:50
9

using Boost.Locale:

ws = boost::locale::conv::utf_to_utf<wchar_t>(s);
vladon
  • 8,158
  • 2
  • 47
  • 91
7

You can use boost path or std path; which is a lot more easier. boost path is easier for cross-platform application

#include <boost/filesystem/path.hpp>

namespace fs = boost::filesystem;

//s to w
std::string s = "xxx";
auto w = fs::path(s).wstring();

//w to s
std::wstring w = L"xxx";
auto s = fs::path(w).string();

if you like to use std:

#include <filesystem>
namespace fs = std::filesystem;

//The same

c++ older version

#include <experimental/filesystem>
namespace fs = std::experimental::filesystem;

//The same

The code within still implement a converter which you dont have to unravel the detail.

Alen Wesker
  • 237
  • 3
  • 6
5

For me the most uncomplicated option without big overhead is:

Include:

#include <atlbase.h>
#include <atlconv.h>

Convert:

char* whatever = "test1234";
std::wstring lwhatever = std::wstring(CA2W(std::string(whatever).c_str()));

If needed:

lwhatever.c_str();
Michael Santos
  • 466
  • 7
  • 15
4

String to wstring

std::wstring Str2Wstr(const std::string& str)
{
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
    std::wstring wstrTo(size_needed, 0);
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
    return wstrTo;
}

wstring to String

std::string Wstr2Str(const std::wstring& wstr)
{
    typedef std::codecvt_utf8<wchar_t> convert_typeX;
    std::wstring_convert<convert_typeX, wchar_t> converterX;
    return converterX.to_bytes(wstr);
}
Isma Rekathakusuma
  • 856
  • 12
  • 18
  • 1
    This Str2Wstr has a problem with 0 termination. It is not possible to concatenate the generated wstrings anymore via "+" (like in wstring s3 = s1 + s2). I will post a answer soon solving this problem. Have to do some testing for memory leaks first. – thewhiteambit Jan 06 '20 at 18:51
3

If you have QT and if you are lazy to implement a function and stuff you can use

std::string str;
QString(str).toStdWString()
Soleil
  • 6,404
  • 5
  • 41
  • 61
Kadir Erdem Demir
  • 3,531
  • 3
  • 28
  • 39
2

Here is my super basic solution that might not work for everyone. But would work for a lot of people.

It requires usage of the Guideline Support Library. Which is a pretty official C++ library that was designed by many C++ committee authors:

    std::string to_string(std::wstring const & wStr)
    {
        std::string temp = {};

        for (wchar_t const & wCh : wStr)
        {
            // If the string can't be converted gsl::narrow will throw
            temp.push_back(gsl::narrow<char>(wCh));
        }

        return temp;
    }

All my function does is allow the conversion if possible. Otherwise throw an exception.

Via the usage of gsl::narrow (https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es49-if-you-must-use-a-cast-use-a-named-cast)

1

method s2ws works well. Hope helps.

std::wstring s2ws(const std::string& s) {
    std::string curLocale = setlocale(LC_ALL, ""); 
    const char* _Source = s.c_str();
    size_t _Dsize = mbstowcs(NULL, _Source, 0) + 1;
    wchar_t *_Dest = new wchar_t[_Dsize];
    wmemset(_Dest, 0, _Dsize);
    mbstowcs(_Dest,_Source,_Dsize);
    std::wstring result = _Dest;
    delete []_Dest;
    setlocale(LC_ALL, curLocale.c_str());
    return result;
}
hahakubile
  • 6,978
  • 4
  • 28
  • 18
  • 8
    What is with all of these answers allocating dynamic memory in an unsafe way, and then copying the data from the buffer to the string? Why does nobody get rid of the unsafe middleman? – Mooing Duck Sep 04 '13 at 16:56
  • hahakubile, can you help please with something similar for ws2s ? – cristian Jun 09 '16 at 15:32
1

Based upon my own testing (On windows 8, vs2010) mbstowcs can actually damage original string, it works only with ANSI code page. If MultiByteToWideChar/WideCharToMultiByte can also cause string corruption - but they tends to replace characters which they don't know with '?' question marks, but mbstowcs tends to stop when it encounters unknown character and cut string at that very point. (I have tested Vietnamese characters on finnish windows).

So prefer Multi*-windows api function over analogue ansi C functions.

Also what I've noticed shortest way to encode string from one codepage to another is not use MultiByteToWideChar/WideCharToMultiByte api function calls but their analogue ATL macros: W2A / A2W.

So analogue function as mentioned above would sounds like:

wstring utf8toUtf16(const string & str)
{
   USES_CONVERSION;
   _acp = CP_UTF8;
   return A2W( str.c_str() );
}

_acp is declared in USES_CONVERSION macro.

Or also function which I often miss when performing old data conversion to new one:

string ansi2utf8( const string& s )
{
   USES_CONVERSION;
   _acp = CP_ACP;
   wchar_t* pw = A2W( s.c_str() );

   _acp = CP_UTF8;
   return W2A( pw );
}

But please notice that those macro's use heavily stack - don't use for loops or recursive loops for same function - after using W2A or A2W macro - better to return ASAP, so stack will be freed from temporary conversion.

TarmoPikaro
  • 4,723
  • 2
  • 50
  • 62
0

std::string -> wchar_t[] with safe mbstowcs_s function:

auto ws = std::make_unique<wchar_t[]>(s.size() + 1);
mbstowcs_s(nullptr, ws.get(), s.size() + 1, s.c_str(), s.size());

This is from my sample code

vSzemkel
  • 624
  • 5
  • 11
0

utf-8 implementation

Assuming that your std::string is utf8-encoded, this is a platform-independent implementation of wstring-string conversion functions:

#include <codecvt>
#include <codecvt>
#include <string>
#include <type_traits>

std::string wstring_to_utf8(std::wstring const& str)
{
  std::wstring_convert<std::conditional_t<
        sizeof(wchar_t) == 4,
        std::codecvt_utf8<wchar_t>,
        std::codecvt_utf8_utf16<wchar_t>>> converter;
  return converter.to_bytes(str);
}

std::wstring utf8_to_wstring(std::string const& str)
{
  std::wstring_convert<std::conditional_t<
        sizeof(wchar_t) == 4,
        std::codecvt_utf8<wchar_t>,
        std::codecvt_utf8_utf16<wchar_t>>> converter;
  return converter.from_bytes(str);
}

The currently most upvoted answer looks similar, but produces incorrect results for non-BMP characters (i.e. Emojis ) on non-Windows platforms. wchar_t is UTF-16 on windows, but UTF-32 everywhere else. The std::conditional takes care of that distinction.

MSVC Deprecation Warning

On msvc this might generate some deprecation warnings. You can disable these by wrapping the functions in

#pragma warning(push)
#pragma warning(disable : 4996)
<the two functions>
#pragma warning(pop)

Johann Gerell's answer explains why it's ok to disable that warning.

Getting utf-8 on msvc

Note that when you write a normal string in your source (like std::string s = "おはよう";), it won't be utf-8 encoded per default on msvc. I would strongly recommend setting your msvc character set to utf-8 to address this: https://learn.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=msvc-170

Chronial
  • 66,706
  • 14
  • 93
  • 99
-1

use this code to convert your string to wstring

std::wstring string2wString(const std::string& s){
    int len;
    int slength = (int)s.length() + 1;
    len = MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, 0, 0); 
    wchar_t* buf = new wchar_t[len];
    MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, buf, len);
    std::wstring r(buf);
    delete[] buf;
    return r;
}

int main(){
    std::wstring str="your string";
    std::wstring wStr=string2wString(str);
    return 0;
}
jaguar
  • 25
  • 3
  • 5
    Note that the question has no mentioning of Windows and this answer is Windows-only. – Johann Gerell Aug 27 '15 at 14:11
  • `CP_ACP` is most certainly the wrong argument. All of a sudden, the executing thread's environment state has an effect on the behavior of the code. Not advisable. Specify a fixed character encoding in your conversion. (And consider handling errors.) – IInspectable Jan 12 '16 at 00:56
-3

string s = "おはよう"; is an error.

You should use wstring directly:

wstring ws = L"おはよう";
AStopher
  • 4,207
  • 11
  • 50
  • 75
Andreas Bonini
  • 44,018
  • 30
  • 122
  • 156
  • 1
    That's not going to work either. You'll have to convert those non-BMP characters to C escape sequences. – Dave Van den Eynde Apr 04 '10 at 07:49
  • 3
    @Dave: it does work if your compiler supports unicode in source files, and all the ones in the last decade do (visual studio, gcc, ...) – Andreas Bonini Apr 04 '10 at 07:52
  • Hi, regardless of the default system encoding (I may have Arabic as my default system encoding for example), what should the encoding of the source code file for L"おはよう" to work? should it be in UTF-16, or can I have UTF-8 without BOM for the .cpp file encoding? – Afriza N. Arief Aug 12 '10 at 04:26
  • 2
    @afriza: it doesn't really matter as long as your compile supports it – Andreas Bonini Aug 12 '10 at 14:00
  • 2
    It is not an error; extended characters in a "narrow" string are defined to map to multibyte sequences. The compiler should support it as long as the OS does, which is the least you can ask. – Potatoswatter Oct 13 '13 at 00:35
  • @DaveVandenEynde Those Japanese Hiragana's are in BMP. – oldherl Jul 11 '19 at 07:42