How can I convert a Unicode string to a char*
or char* const
in embarcadero c++ ?
-
1From Unicode to what locale? Or if on windows which codepage? What exactly are you trying to do? – RedX Jun 14 '12 at 20:17
-
http://www.cplusplus.com/reference/clibrary/cstdlib/wcstombs/ ? – Luchian Grigore Jun 14 '12 at 20:18
-
1Embarcadero seems [remarkably well documented](http://docwiki.embarcadero.com/CodeExamples/en/TEncoding_(C%2B%2B)) – jxh Jun 14 '12 at 20:25
-
I'm sorry but you clearly didn't actually search the documentation, I found [__the exact page that addresses this issue__](http://docwiki.embarcadero.com/Libraries/XE2/en/System.UnicodeString.t_str) in under 60 seconds. As another stated, Embarcadero does an impeccable job of providing robust documentation with relevant examples. – arkon Mar 26 '13 at 02:21
4 Answers
String text = "Hello world";
char *txt = AnsiString(text).c_str();
Older text.t_str() is now AnsiString(String).c_str()
-
-
1This will fail to handle any characters outside the locale being used, and since there is no locale on Windows that supports all characters (e.g., a UTF-8 locale would support all characters) `AnsiString` simply cannot provide lossless conversions for all inputs on Windows. – bames53 Jun 14 '13 at 20:36
-
Note that you can specify the codepage for an AnsiString: http://docwiki.embarcadero.com/RADStudio/Seattle/en/How_to_Handle_Delphi_AnsiString_Code_Page_Specification_in_C%2B%2B However, you are right that converting from Unicode to ANSI may (and often will) be lossy. – David Dec 10 '15 at 19:55
"Unicode string" really isn't specific enough to know what your source data is, but you probably mean 'UTF-16 string stored as wchar_t array' since that's what most people who don't know the correct terminology use.
"char*" also isn't enough to know what you want to target, although maybe "embarcadero" has some convention. I'll just assume you want UTF-8 data unless you mention otherwise.
Also I'll limit my example to what works in VS2010
// your "Unicode" string
wchar_t const * utf16_string = L"Hello, World!";
// #include <codecvt>
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert;
std::string utf8_string = convert.to_bytes(utf16_string);
This assumes that wchar_t strings are UTF-16, as is the case on Windows, but otherwise is portable code.

- 86,085
- 15
- 179
- 244
-
Embarcadero does indeed have "some convention" as you put it. Quote from docs: *'Delphi also supports UnicodeString, but implements it as a primitive type rather than a class. By default, variables declared as type String are UnicodeString.'* Also important to note: *'Despite its name, UnicodeString can represent both Unicode strings and ANSI strings, ANSI strings being converted first.'* – arkon Mar 26 '13 at 02:28
-
'ANSI string' is still insufficient even if we ignore the fact that the American National Standards Institute has never defined such a thing and recognize that thay are referring to what Microsoft refers to by that name; Microsoft defines many 'code pages' which can be used with so-called "ANSI" strings. – bames53 Mar 26 '13 at 06:46
You can reinterpret any array as an array of char pointers legally. So if your Unicode data comes in 4-byte code units like
char32_t data[100];
then you can access it as a char array:
char const * p = reinterpret_cast<char const*>(data);
for (std::size_t i = 0; i != sizeof data; ++i)
{
std::printf("Byte %03zu is 0x%02X.\n", i, p[i]);
}
That way, you can examine the individual bytes of your Unicode data one by one.
(That has of course nothing to do with converting the encoding of your text. For that, use a library like iconv
or ICU.)

- 464,522
- 92
- 875
- 1,084
-
Sine the OP specifically asked to convert, this doesn't seem a very useful answer. – David Dec 10 '15 at 19:54
If you work with Windows:
//#include <windows.h>
u16string utext = u"объява";
char text[0x100];
WideCharToMultiByte(CP_UTF8,NULL,(const wchar_t*)(utext.c_str()),-1,text,-1,NULL,NULL);
cout << text;
We can't use std::wstring_convert, wherefore is not available in MinGW 4.9.2.

- 33
- 5