0

Some of my Chinese software users noticed a strange C++ exception being thrown when my C++ code for Windows tried to list all running processes:

在多字节的目标代码页中,没有此 Unicode 字符可以映射到的字符。

Translated to English this roughly means:

There are no characters to which this Unicode character can be mapped in the multi-byte target code page.

The code which prints this is:

try
{
    list_running_processes();
}
catch (std::runtime_error &exception)
{
    LOG_S(ERROR) << exception.what();
    return EXIT_FAILURE;
}

The most likely culprit source code is:

std::vector<running_process_t> list_running_processes()
{
    std::vector<running_process_t> running_processes;

    const auto snapshot_handle = unique_handle(CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0));
    if (snapshot_handle.get() == INVALID_HANDLE_VALUE)
    {
        throw std::runtime_error("CreateToolhelp32Snapshot() failed");
    }
    
    PROCESSENTRY32 process_entry{};
    process_entry.dwSize = sizeof process_entry;

    if (Process32First(snapshot_handle.get(), &process_entry))
    {
        do
        {
            const auto process_id = process_entry.th32ProcessID;
            const auto executable_file_path = get_file_path(process_id);
            // *** HERE ***
            const auto process_name = wide_string_to_string(process_entry.szExeFile);
            running_processes.emplace_back(executable_file_path, process_name, process_id);
        } while (Process32Next(snapshot_handle.get(), &process_entry));
    }

    return running_processes;
}

Or alternatively:

std::string get_file_path(const DWORD process_id)
{
    std::string file_path;
    const auto snapshot_handle = unique_handle(CreateToolhelp32Snapshot(TH32CS_SNAPMODULE, process_id));
    MODULEENTRY32W module_entry32{};
    module_entry32.dwSize = sizeof(MODULEENTRY32W);
    if (Module32FirstW(snapshot_handle.get(), &module_entry32))
    {
        do
        {
            if (module_entry32.th32ProcessID == process_id) 
            {
                return wide_string_to_string(module_entry32.szExePath); // *** HERE ***
            }
        } while (Module32NextW(snapshot_handle.get(), &module_entry32));
    }

    return file_path;
}

This is the code for performing a conversion from a std::wstring to a regular std::string:

std::string wide_string_to_string(const std::wstring& wide_string)
{
    if (wide_string.empty())
    {
        return std::string();
    }

    const auto size_needed = WideCharToMultiByte(CP_UTF8, 0, &wide_string.at(0),
        static_cast<int>(wide_string.size()), nullptr, 0, nullptr, nullptr);
    std::string str_to(size_needed, 0);
    WideCharToMultiByte(CP_UTF8, 0, &wide_string.at(0), static_cast<int>(wide_string.size()), &str_to.at(0),
        size_needed, nullptr, nullptr);
    return str_to;
}

Is there any reason this can fail on Chinese language file paths or Chinese language Windows etc.? The code works fine on regular western Windows machines. Let me know if I'm missing any crucial pieces of information here since I cannot debug or test this on my own right now without access to one of the affected machines.

BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185
  • 1
    *"crash/error being thrown"* - which one is it? As for errors in the code, `wide_string_to_string(module_entry32.szExePath)` can fail if `szExePath` is a null pointer. – IInspectable Jan 03 '22 at 08:52
  • Suggestion: Inside and at the end of the `do` loop -add this statement on each iteration: `process_entry = {};` so that it gets reset on each subsequent iteration. – selbie Jan 03 '22 at 09:04
  • Can you show the code that is actually printing the garage chars? – selbie Jan 03 '22 at 09:08
  • @selbie: I added the code to the question – BullyWiiPlaza Jan 03 '22 at 10:33
  • @IInspectable `szExePath` (and `szExeFile`) can not be null pointer. this is embedded array ( `WCHAR szExePath[MAX_PATH];`) – RbMm Jan 03 '22 at 11:11
  • Note that, since you're using size() and not -1 in WideCharToMultiByte, "the resulting character string is not null-terminated, and the returned length does not include this character" https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte#parameters so, depending on the rest of your code, it could cause issues (but still I don't see why it could be different on Chinese Windows...) – Simon Mourier Jan 03 '22 at 13:06
  • @SimonMourier This shouldn't be a problem: `std::string str_to(size_needed, 0);` actually allocates a buffer `size_needed+1` bytes large, adding a terminating NUL. – Igor Tandetnik Jan 03 '22 at 14:23
  • @IgorTandetnik - are you sure its guaranteed? https://stackoverflow.com/questions/11752705/does-stdstring-have-a-null-terminator – Simon Mourier Jan 03 '22 at 14:51
  • The only C++ exception the code shown can throw is a [`bad_allow`](https://en.cppreference.com/w/cpp/memory/new/bad_alloc). Since this is not a `runtime_exception` your code wouldn't observe it. The code that triggers the described error state is not the code we see. – IInspectable Jan 03 '22 at 14:52
  • @sim Yes, that's [guaranteed behavior](https://stackoverflow.com/a/6077274/1889329). Starting with C++11 (I believe) `c_str()` and `&s[0]` are required to return the same pointer. It is legal to write `size()` characters to that pointer, plus it is legal to write to `&s[s.size()]` as long as you write a NUL character. – IInspectable Jan 03 '22 at 15:01
  • 1
    You are converting to UTF-8. I don't know what LOG_S(ERROR) is. Does it support UTF-8? – Raymond Chen Jan 05 '22 at 05:24
  • @RaymondChen: `LOG_S(ERROR)` is from loguru: https://github.com/emilk/loguru – BullyWiiPlaza Jan 06 '22 at 15:26
  • So does it support UTF-8? Are you viewing the results as UTF-8? – Raymond Chen Jan 06 '22 at 15:28
  • @RaymondChen: Yes, `LOG_S(ERROR) << "漢字";` prints out `ERR| 漢字` correctly to the console. – BullyWiiPlaza Jan 07 '22 at 15:30
  • Great, you found an encoding that works! So what encoding is it? Whatever it is, you should use that encoding when you call WideCharToMultiByte. – Raymond Chen Jan 07 '22 at 15:35

1 Answers1

0

I managed to test on a Chinese machine and it turns out that converting a file path from wide string to a regular string will produce a bad file path output if the file path contains e.g. Chinese (non-ASCII) symbols.

I could fix this bug by replacing calls to wide_string_to_string() with std::filesystem::path(wide_string_file_path).string() since the std::filesystem API will handle the conversion correctly for file paths unlike wide_string_to_string().

BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185
  • Makes sense. `std::filesystem` will forward the narrow string to the OS, and Windows will then convert it back using its current code page - which cannot be UTF-8. – MSalters May 10 '22 at 16:25