C++, as far as the standard goes, doesn't know about encodings. Java does. So, to interface the two, make Java emit some well-defined encoding, such as UTF8:
byte[] utf8str = str.getBytes("UTF8");
In C++, use a library such as iconv()
to transform the UTF8-string either into another string of a well-defined encoding (e.g. std::u32string
with UTF-32, if you have C++11, or std::basic_string<uint32_t>
or std::vector<uint32_t>
otherwise), or, alternatively, convert it to WCHAR_T
encoding, to be stored in a std::wstring
, and proceed further to convert this to a multi-byte string via the standard function wcstombs()
if you wish to interface with your environment.
The choice depends on what you need to do with the string. For serialization or text processing, go with the definite encoding (e.g. UTF-32). For writing to the standard output using the system's locale, use the multibyte conversion. (Here is a slightly longer discussion of encodings in C++.)