convert unicode to char

Question

How can I convert a Unicode string to a char* or char* const in embarcadero c++ ?

From Unicode to what locale? Or if on windows which codepage? What exactly are you trying to do? — RedX, Jun 14 '12 at 20:17
http://www.cplusplus.com/reference/clibrary/cstdlib/wcstombs/ ? — Luchian Grigore, Jun 14 '12 at 20:18
Embarcadero seems [remarkably well documented](http://docwiki.embarcadero.com/CodeExamples/en/TEncoding_(C%2B%2B)) — jxh, Jun 14 '12 at 20:25
I'm sorry but you clearly didn't actually search the documentation, I found [__the exact page that addresses this issue__](http://docwiki.embarcadero.com/Libraries/XE2/en/System.UnicodeString.t_str) in under 60 seconds. As another stated, Embarcadero does an impeccable job of providing robust documentation with relevant examples. — arkon, Mar 26 '13 at 02:21

score 4 · Answer 1 · edited Oct 19 '12 at 05:59

4

String text = "Hello world";
char *txt = AnsiString(text).c_str();

Older text.t_str() is now AnsiString(String).c_str()

edited Oct 19 '12 at 05:59

Nikhil

16,194
20
64
81

answered Oct 19 '12 at 05:54

Tuomas

41
2

This is the proper answer, straight from the Embarcadero documentation. – arkon Mar 26 '13 at 02:24
1

This will fail to handle any characters outside the locale being used, and since there is no locale on Windows that supports all characters (e.g., a UTF-8 locale would support all characters) `AnsiString` simply cannot provide lossless conversions for all inputs on Windows. – bames53 Jun 14 '13 at 20:36
Note that you can specify the codepage for an AnsiString: http://docwiki.embarcadero.com/RADStudio/Seattle/en/How_to_Handle_Delphi_AnsiString_Code_Page_Specification_in_C%2B%2B However, you are right that converting from Unicode to ANSI may (and often will) be lossy. – David Dec 10 '15 at 19:55

score 2 · Accepted Answer · answered Jun 14 '12 at 20:36

2

"Unicode string" really isn't specific enough to know what your source data is, but you probably mean 'UTF-16 string stored as wchar_t array' since that's what most people who don't know the correct terminology use.

"char*" also isn't enough to know what you want to target, although maybe "embarcadero" has some convention. I'll just assume you want UTF-8 data unless you mention otherwise.

Also I'll limit my example to what works in VS2010

// your "Unicode" string
wchar_t const * utf16_string = L"Hello, World!";

// #include <codecvt>
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert;

std::string utf8_string = convert.to_bytes(utf16_string);

This assumes that wchar_t strings are UTF-16, as is the case on Windows, but otherwise is portable code.

answered Jun 14 '12 at 20:36

bames53

86,085
15
179
244

Embarcadero does indeed have "some convention" as you put it. Quote from docs: *'Delphi also supports UnicodeString, but implements it as a primitive type rather than a class. By default, variables declared as type String are UnicodeString.'* Also important to note: *'Despite its name, UnicodeString can represent both Unicode strings and ANSI strings, ANSI strings being converted first.'* – arkon Mar 26 '13 at 02:28
'ANSI string' is still insufficient even if we ignore the fact that the American National Standards Institute has never defined such a thing and recognize that thay are referring to what Microsoft refers to by that name; Microsoft defines many 'code pages' which can be used with so-called "ANSI" strings. – bames53 Mar 26 '13 at 06:46

score 1 · Answer 3 · answered Jun 14 '12 at 20:17

You can reinterpret any array as an array of char pointers legally. So if your Unicode data comes in 4-byte code units like

char32_t data[100];

then you can access it as a char array:

char const * p = reinterpret_cast<char const*>(data);

for (std::size_t i = 0; i != sizeof data; ++i)
{
    std::printf("Byte %03zu is 0x%02X.\n", i, p[i]);
}

That way, you can examine the individual bytes of your Unicode data one by one.

(That has of course nothing to do with converting the encoding of your text. For that, use a library like iconv or ICU.)

Sine the OP specifically asked to convert, this doesn't seem a very useful answer. — David, Dec 10 '15 at 19:54

AWalkmen · Answer 4 · 2015-11-02T21:05:55.443

0

If you work with Windows:

//#include <windows.h>
u16string utext = u"объява";
char text[0x100];
WideCharToMultiByte(CP_UTF8,NULL,(const wchar_t*)(utext.c_str()),-1,text,-1,NULL,NULL);
cout << text;

We can't use std::wstring_convert, wherefore is not available in MinGW 4.9.2.

edited Nov 02 '15 at 21:05

answered Nov 02 '15 at 20:54

AWalkmen

33
5

convert unicode to char

4 Answers4

Linked