Polish characters translation errors C/C++

Question

Working on a NON-UNICODE international app(C/C++ on Windows) that can import files with polish characters. Using a Polish Locale as well.

Locale=pl-PL  (set in app via "setlocale(LC_ALL, localeSetting)")
LocaleID=1045
CharSet=238

This feature worked correctly on previous releases but in this last year we updated our file open/file save dialogs from the old vista style to the much more current Common Item File Dialogs. The following error occurs directly after capturing the input text from the file open dialog. Ergo its nothing within our application but the common item dialog directly. The dialog captures the incoming file path as a PWSTR (wide string) and we convert that over to a multi-byte string to pass it back over to our app from the common item dialog.

code in which error is happening within Common Item Dialog code:

PWSTR FilePath;
hr = pItem->GetDisplayName(SIGDN_FILESYSPATH, &FilePath);

if (SUCCEEDED(hr))
{
    char temp_mbarray[MAX_SIZE_PATH];
    if (wcstombs(temp_mbarray, FilePath, MAX_SIZE_PATH) > 0)
    {
        std::string pass_back_string(temp_mbarray);
        *results = pass_back_string;
    }
}

The correct polish file name: "kołozębate"

Original filename result with the c-style translation( using wcstombs() ) is "koÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ..."

It seemingly gets the first two characters k-o correctly since they aren't inherently Polish, but as soon as it encounters a Polish character it goes just fills the rest of the array with the Ì character. ends up being much longer than original filename.

Swapped over to a windows style conversion from wide char to multi-byte string( WideCharToMultiByte() ):

PWSTR FilePath;
hr = pItem->GetDisplayName(SIGDN_FILESYSPATH, &FilePath);
if (SUCCEEDED(hr))
{
    char temp_mbarray[MAX_SIZE_PATH];
    int wcs_length = int(wcslen(FilePath)) + 1;

    if(WideCharToMultiByte(CP_UTF8, 0, FilePath, wcs_length, temp_mbarray, MAX_PATH, nullptr, nullptr) > 0)
    {
        std::string pass_back_string(temp_mbarray);
        *results = **strong text**;
    }
}

"koÅ‚ozÄ™bate" is the result after the change over to WideCharToMultiByte() which seems like a vast improvement, but seems that all the special characters that are uniquely Polish in this case are just unknown, or converted incorrectly. Not really sure

In your eample with WideCharToMultiByte you convert wide encoded string in FilePath, to UTF8 string, and from that string it seems is a correct result: "koÅ‚ozÄ™bate" - at least is looks so. If you want a code page 1250 encoded multibyte string then use `WideCharToMultiByte (1250,....` — marcinj, Jan 18 '21 at 20:03
Why are you using CP_UTF8? You are (should be) on CP_1250. Being non-Unicode is unsustainable in the long run. Just wait for your customers complying thet cannot open files with emojis in their name. — n. m. could be an AI, Jan 18 '21 at 20:04
n.'pronouns'm. Because this is a massive legacy suite of apps and we haev nowhere near the resources to make it unicode — WallofKron, Jan 18 '21 at 20:26
@marcinj I've changed over both the WideCharToMultiByte and setLocale to reflect that 1250 code page yet I'm still getting "ko³ozêbate". Its even closer, its just fudging up the 2 special characters when it should be kołozębate. Any suggestions? — WallofKron, Jan 18 '21 at 23:51
That is correct. But whatever you use to display the string (it is not obvious) is using 1252. https://en.wikipedia.org/wiki/Windows-1250. — Hans Passant, Jan 19 '21 at 00:32
@HansPassant regardless of what we do with WideCharToMultiByte() and the set locale ? What do you mean? — WallofKron, Jan 19 '21 at 00:57
@WallofKron if you would change language of you Windows to polish then you would get correct chars in your output. I assume you visualize this text using some WinAPI methods like MessageBoxA? If you visualize with TextOutA then make sure your set font has a correct LOGFONT::lfCharSet set to 1250 (this should work even on non polish windows). If you visualize with the use of some WinApi controls and you use SetWindowText to update text - then ... I suppose you are out of luck on non polish language windows. — marcinj, Jan 19 '21 at 09:20

Polish characters translation errors C/C++

0 Answers0