48

I am trying to convert a program for multibyte character to Unicode.

I have gone through the program and preceded the string literals with L so they look like L"string".

This has worked but I am now left with a C style string that won't conform. I have tried the L and putting it in TEXT() but the L gets added to the variable name -- not the string -- if I use TEXT().

I have tried making it a TCHAR but then it complains that it cannot convert a TCHAR to a char *.

What options am I left with?

I know C and C++ are different. It is an old in-house C library that has been used in C++ projects for several years now.

E_net4
  • 27,810
  • 13
  • 101
  • 139
Skeith
  • 2,512
  • 5
  • 35
  • 57
  • 1
    The main reason why someone would downvote would more be, imho, the lack of source code in your question. An image is worth a thousand words, and so does a piece of code. Even a trivial one. – ereOn Jul 28 '11 at 11:58
  • You can definitely write code that works with `TCHAR` no matter what the compiler setting, you just have to create the right infrastructure. In C++, overloading does all the heavy lifting for you. – Kerrek SB Jul 28 '11 at 12:17
  • Possible duplicate of [How to convert char\* to LPCWSTR?](http://stackoverflow.com/questions/19715144/how-to-convert-char-to-lpcwstr) – Rusty Nail May 04 '16 at 10:28

5 Answers5

58

The std::mbstowcs function is what you are looking for:

 char text[] = "something";
 wchar_t wtext[20];
 mbstowcs(wtext, text, strlen(text)+1);//Plus null
 LPWSTR ptr = wtext;

for strings,

 string text = "something";
 wchar_t wtext[20];
 mbstowcs(wtext, text.c_str(), text.length());//includes null
 LPWSTR ptr = wtext;

--> ED: The "L" prefix only works on string literals, not variables. <--

CraftedGaming
  • 499
  • 7
  • 21
Raphael R.
  • 23,524
  • 1
  • 22
  • 18
  • 1
    that's deprecated, you should use `mbstowcs_s()` – Olipro Jul 28 '11 at 12:05
  • 1
    @Olipro: This is "deprecated" only in the Windows world. The OP did not stated which platform he was targeting. – ereOn Jul 28 '11 at 12:18
  • 7
    it's fairly implicit that the platform is Windows, but if you think otherwise, go ahead and prove me wrong. – Olipro Jul 28 '11 at 12:25
  • @Olipro: what's the advantage of the `_s` versions? As far as I can tell, you pass another count parameter that indicates at most how many characters you want written out, but how does that help? You already specify the size of the output buffer in another argument, is this just for the sake of the terminating zero? – Kerrek SB Jul 28 '11 at 12:26
  • http://msdn.microsoft.com/en-us/library/8ef0s5kh%28v=vs.80%29.aspx – Olipro Jul 28 '11 at 12:27
  • 1
    Do I understand correctly that it is only possible to convert `char*` to `LPWSTR` if the length of the `char*` is known? If not, why was the assumption, that 20 characters will suffice, made? – masiton Aug 31 '20 at 08:17
12

The clean way to use mbstowcs is to call it twice to find the length of the result:

  const char * cs = <your input char*>
  size_t wn = mbsrtowcs(NULL, &cs, 0, NULL);

  // error if wn == size_t(-1)

  wchar_t * buf = new wchar_t[wn + 1]();  // value-initialize to 0 (see below)

  wn = mbsrtowcs(buf, &cs, wn + 1, NULL);

  // error if wn == size_t(-1)

  assert(cs == NULL); // successful conversion

  // result now in buf, return e.g. as std::wstring

  delete[] buf;

Don't forget to call setlocale(LC_CTYPE, ""); at the beginning of your program!

The advantage over the Windows MultiByteToWideChar is that this is entirely standard C, although on Windows you might prefer the Windows API function anyway.

I usually wrap this method, along with the opposite one, in two conversion functions string->wstring and wstring->string. If you also add trivial overloads string->string and wstring->wstring, you can easily write code that compiles with the Winapi TCHAR typedef in any setting.

[Edit:] I added zero-initialization to buf, in case you plan to use the C array directly. I would usually return the result as std::wstring(buf, wn), though, but do beware if you plan on using C-style null-terminated arrays.[/]

In a multithreaded environment you should pass a thread-local conversion state to the function as its final (currently invisible) parameter.

Here is a small rant of mine on this topic.

Community
  • 1
  • 1
Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 2
    +1 for showing how to call the function twice to get the length of the output buffer – David Heffernan Jul 28 '11 at 12:11
  • Cheers. In the privacy of my own thoughts, I actually use a variable-length array for `buf`, but I wanted to avoid that in the light of SO scrutiny :-) – Kerrek SB Jul 28 '11 at 12:13
  • Update: nowadays I would look for [`codecvt`](http://en.cppreference.com/w/cpp/locale/codecvt), which wraps `mbsrtowcs`/`wcsrtombs`. – Kerrek SB Jul 19 '13 at 12:23
5

I'm using the following in VC++ and it works like a charm for me.

CA2CT(charText)
9T9
  • 698
  • 2
  • 9
  • 22
5

This version, using the Windows API function MultiByteToWideChar(), handles the memory allocation for arbitrarily long input strings.

int lenA = lstrlenA(input);
int lenW = ::MultiByteToWideChar(CP_ACP, 0, input, lenA, NULL, 0);
if (lenW>0)
{
    output = new wchar_t[lenW];
    ::MultiByteToWideChar(CP_ACP, 0, input, lenA, output, lenW);
} 
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • @Kerrek In the interests of brevity I omitted the code that calls `free` ;-) – David Heffernan Jul 28 '11 at 12:16
  • 1
    I'd rather you leave it as it is than call `free()`! This is definitely a case for the celebrated `delete[]` expression :-) – Kerrek SB Jul 28 '11 at 12:18
  • @kerrek Indeed! It's so hard to keep track of C and C++ going from question to question. – David Heffernan Jul 28 '11 at 12:30
  • It seems there is no need in 'lstrlenA(input)' call. See [MSDN](https://msdn.microsoft.com/en-us/library/dd319072.aspx). _cbMultiByte : Size, in bytes, of the string indicated by the lpMultiByteStr parameter. Alternatively, this parameter can be set to -1 if the string is null-terminated._ Just use -1 instead of lenA. – Alan Kazbekov Aug 04 '16 at 15:14
  • @Alan it could be done that way but on the other hand this way means the length is calculated once rather than twice. Personal choice I guess. – David Heffernan Aug 04 '16 at 15:18
2

You may use CString, CStringA, CStringW to do automatic conversions and convert between these types. Further, you may also use CStrBuf, CStrBufA, CStrBufW to get RAII pattern modifiable strings

Ajay
  • 18,086
  • 12
  • 59
  • 105
  • Note, however, that they are ATL/MFC specific. – JBES Dec 13 '17 at 18:16
  • @JBES, Yes they are. I answered it 6+ year ago where ATL/MFC was largely used. Now, even the C++ language has library features for conversions. – Ajay Dec 14 '17 at 09:46