2

I get a JSON string from the server, it's type is std::wstring and its content is:

[{
    "nodeRef": "workspace://SpacesStore/12f1623f-196a-4289-a9af-07f3d1ee7c4e",
    "name": "/Oracle® Fusion Developer's Guide for Oracle Application Development Framework b31974.txt",
    "type": "cm:content",
    "sys:node-dbid": "228,137",
    "cm:modified": "2013-09-23 13:51:33.682 +0800",
    "size": "260",
    "checksum": "4D59ABBC6A45BE32750CAF541EED29C4"
}]

I try to convert it to std::string in order to use rapidjson to deal with the string, but failed when coverting "®", since the "®" changes to "?". I try these methods bellow but none of them is successful:

//method 1
return (char *)(_bstr_t)wstr.c_str();

//method 2
const wchar_t* wp = wstr.c_str();
int len= WideCharToMultiByte(CP_ACP,0,wp,-1,NULL,0,NULL,NULL);  
char * m_char=new char[len];  
WideCharToMultiByte(CP_ACP,0,wp,-1,m_char,len,NULL,NULL);  
m_char[len-1]='\0'; 

std::string strTemp(m_char);
delete [] m_char;
return strTemp; 


//method 3
std::string curLocale = setlocale(LC_ALL, NULL);       
setlocale(LC_ALL, "en-us");
const wchar_t* _Source = wstr.c_str();
size_t _Dsize = 2 * wstr.size() + 1;
char *_Dest = new char[_Dsize];
memset(_Dest,0,_Dsize);
wcstombs(_Dest,_Source,_Dsize);
std::string result = _Dest;
delete []_Dest;
setlocale(LC_ALL, curLocale.c_str());
return result;

I don't know why! anybody can help? thanks!

my locale is first "Chinese (Simplified)_People's Republic of China.936", when I change it to "en-us" & use method 2, the result is OK. the "®" remains the same. But most of the computers here are "chs" by default. So is there another solution without change the system language?

stefanie wang
  • 71
  • 1
  • 3
  • The obvious thing missing here is what do you expect "®" to change to? Wide strings and narrow strings aren't things you can just convert without thinking about what you wan to happen with unusual characters. – john Sep 23 '13 at 08:16
  • A reasonable assumption would be that you want to convert your wide string (which is UTF-16 presumably) to a UTF-8 string. If so then replace CP_ACP with CP_UTF8 in your second example. – john Sep 23 '13 at 08:26
  • 2
    And use a `std::vector` if you're going that route. – WhozCraig Sep 23 '13 at 08:29
  • hi, john! I just expect "®" remain the same after the json changed from wstring to string – stefanie wang Sep 23 '13 at 08:42
  • I try to replace CP_ACP with UTF-8, and the "®" changes to "庐", that's not what I expect – stefanie wang Sep 23 '13 at 08:45
  • @stefaniewang The point is what do you expect? It's impossible to help unless you say that. When you convert to std::string you **must** have some encoding in mind, UTF-8 or something else. Now I guess you don't know what encoding to use, so really that's the first question you should be asking yourself. Only once you have decided on the encoding can we then talk about how to do the conversion to that encoding. – john Sep 23 '13 at 11:29
  • @stefaniewang 'Stay the same' isn't specific enough, it's all just bytes, only when you know how those bytes are being interpreted (i.e. what the encoding is) can you say whether it has stayed the same or not. So again, some program or API is going to be reading your converted string, what encoding is that expecting? – john Sep 23 '13 at 11:34
  • @stefaniewang I guess you need to read up on character sets and character encoding. You can't do this unless you have at least a basic understanding of those. – john Sep 23 '13 at 11:37
  • @john actually, I want to convert my wide string (which is UTF-16 presumably) to a UTF-8 string, I replace CP_ACP with CP_UTF8 in my second example, the "®" in my wide string convert to "?" which means this character can't be convert to a UTF-8 encoding type. what can I do to fix the problem? Many thanks to your patience – stefanie wang Sep 24 '13 at 02:51
  • It's not true that "®" cannot convert to UTF-8, "®" is Unicode character 00AE, so it representable in UTF-8. If I had to guess I would say that whatever you are using to **view** the converted string doesn't understand UTF-8. In UTF-8 "®" is the two byte sequence C2 AE, the C2 byte might be showing as "?". – john Sep 24 '13 at 08:37

0 Answers0