0

I require to convert a LPCWSTR data to wchar_t*. I tried a bunch of methods, and some work, but when I try to get their code page, they are showing different values.

Code overview:

std::string ChineseCharacter(LPCWSTR Data) //Data value: "丂\n"
{
    CString sHexValue = "";
    std::wstring sData(Data);

    wchar_t* str1 = (wchar_t*)Data;
    //wchar_t* str2 = (wchar_t*)_wcsdup(sData.c_str());

    wchar_t* str3 = (wchar_t*)(L"丂\n"); //u4E02 -- CP 8140 ** CP is needed

    for (int i = 0; i < 4; i++)
    {
        sHexValue2.Format("%02x", str1[i]);//-- 4E02 -- FAIL
        //sHexValue2.Format("%02x", str2[i]);//-- 4E02 -- FAIL
        sHexValue2.Format("%02x", str3[i]);//-- First loop: 81, second one: 40 -- OK
    }
}

According to the watcher, the values are:

str1= L"丂\n"
str3= L"@\n"

My doubt is, how can I pass the value of Data to a wchar_t*, equal as when I hard-code the value?

Reference:
https://uic.io/en/charset/show/gb18030/

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Ferrus
  • 15
  • 6
  • Probable dupe: [**What does LPCWSTR stand for and how should it be handled?**](https://stackoverflow.com/questions/2230758/what-does-lpcwstr-stand-for-and-how-should-it-be-handled), although none of the answers there provides links to supporting documentation. – Andrew Henle Jan 25 '23 at 00:08
  • i thought they are the same except LPCWSTR is a const and wchar_t is not – pm100 Jan 25 '23 at 00:26
  • LPCWSTR is const wchar_t *. You do not have any issue with converting between them (except for dealing with const correctness). Your issue is with encoding. "sHexValue2.Format("%02x", str1[i]);//-- 4E02 -- FAIL" is _correct for utf-16 encoding_. However, you want to use a _GB 18030 UTF_ encoding, where it is not correct. This is being missed by readers of your post - especially with the misleading title that gives no indication of the actual issue. You should refocus your question, provide information on the platform you are using, and write a more appropriate title. – Avi Berger Jan 25 '23 at 01:16

1 Answers1

3

LPCWSTR is just an alias for const wchar_t*. To convert that to wchar_t*, you can use const_cast, eg:

wchar_t* str = const_cast<wchar_t*>(Data);

(just make sure you don't write anything to the memory that is pointed at).

However, nothing in the code you have shown requires the use of non-const wchar_t* (or std::wstring, either), so you can simply loop through Data directly, there is no need to convert LPCWSTR to wchar_t* at all, eg:

std::string ChineseCharacter(LPCWSTR Data)
{
    CString sHexValue;

    for (int i = 0; (i < 4) && (Data[i] != L'\0'); ++i)
    {
        sHexValue.Format("%02hx", static_cast<unsigned short>(Data[i]));
    }

    return static_cast<char*>(sHexValue);
}

Alternatively, using just standard C++:

std::string ChineseCharacter(const wchar_t *Data)
{
    std::ostringstream sHexValue;

    for (int i = 0; (i < 4) && (Data[i] != L'\0'); ++i)
    {
        sHexValue << std::setw(2) << std::setfill('0') << std::hex << static_cast<unsigned short>(Data[i]);
    }

    return sHexValue.str();
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770