4

I cannot get it work to write a correct utf-8 string to a powershell sub-process. ASCII characters work but utf-8 chars, e.g. 'ü', will be interpreted differently. Same problem when reading from the same powershell sub-process.

Summarized: I want to use powershell through my program with utf-8 encoding.

Update: Allocating a console with AllocConsole(); and then calling SetConsoleCP(CP_UTF8); and SetConsoleOutputCP(CP_UTF8);, as @mklement mentioned in his answer, worked for me, if you have a GUI application without any console. If you have a console application you don't have to allocate the console manually.

Update 2: If you have a GUI and called AllocConsole(), you can just call ShowWindow(GetConsoleWindow(), SW_HIDE); afterwards to hide the console, as mentioned here.

What I have tried so far:

  • Setting Input and Output encoding to utf-8 $OutputEncoding = [System.Console]::OutputEncoding = [System.Console]::InputEncoding = [System.Text.Encoding]::UTF8 within the process
  • Doing the same with UTF-16 in case there is a bug, e.g. ...ext.Encoding]::Unicode
  • Doing the same with ISO-Latin 1 (cp1252)
  • Using wchar_t as buffer and input for all tested encodings
  • Testing byte order for the given string
  • Testing Unicode (4 byte per character instead of 2)
  • Building the string bit by bit by myself
  • Setting compiler flag to \D UNICODE

Code Example for writing:

std::string test("ls ä\n");
DWORD ret = WriteFile(std_in_write, test.c_str(), test.size(), &number_of_bytes_written, nullptr);
if (ret == 0) {
    throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_WRITE_TO_FILE, GetLastError());
}

Output: ls ä

Example Code:

HANDLE std_in_read = nullptr;
HANDLE std_in_write = nullptr;
HANDLE std_out_read = nullptr;
HANDLE std_out_write = nullptr;
SECURITY_ATTRIBUTES security_attr;
STARTUPINFO startup_info;
PROCESS_INFORMATION process_information;
DWORD buffer_size = 1000000;

security_attr = {sizeof(SECURITY_ATTRIBUTES), nullptr, true};

if (!CreatePipe(&std_in_read, &std_in_write, &security_attr, buffer_size)) {
    throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_CREATE_IN_PIPE, GetLastError());
}

if (!CreatePipe(&std_out_read, &std_out_write, &security_attr, buffer_size)) {
    throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_CREATE_OUT_PIPE, GetLastError());
}

GetStartupInfo(&startup_info);
startup_info.dwFlags = STARTF_USESTDHANDLES | STARTF_USESHOWWINDOW;
startup_info.wShowWindow = SW_HIDE;
startup_info.hStdOutput = std_out_write;
startup_info.hStdError = std_out_write;
startup_info.hStdInput = std_in_read;

if (!CreateProcess(TEXT(default_powershell_path), nullptr, nullptr, nullptr, TRUE, 0, nullptr, TEXT(default_windows_path), &startup_info, &process_information)) {
    throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_CREATE_PROCESS, GetLastError());
}

std::string test("ls ä\n");
DWORD ret = WriteFile(std_in_write, test.c_str(), test.size(), &number_of_bytes_written, nullptr);
if (ret == 0) {
    throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_WRITE_TO_FILE, GetLastError());
}

DWORD dword_read;
while (true) {
    DWORD total_bytes_available;
    if (PeekNamedPipe(std_out_read, nullptr, 0, nullptr, &total_bytes_available, nullptr) == 0) {
        throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_COPY_FROM_PIPE, GetLastError());
    }

    if (total_bytes_available != 0) {
        DWORD minimum = min(buffer_size, total_bytes_available);
        char buf[buffer_size];
        if (ReadFile(std_out_read, buf, minimum, &dword_read, nullptr) == 0) {
            throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_READ_FILE, GetLastError());
        }

        std::string tmp(buf);
        std::cout << tmp << std::endl;
    }

    if (total_bytes_available == 0) {
        break;
    }

    std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}

Note: No duplicate of redirect-input-and-output-of-powershell-exe-to-pipes-in-c, since the code only works for ASCII characters and won't handle utf-8 characters at all.

Also no duplicate of c-getting-utf-8-output-from-createprocess, because the suggested solutions won't work as mentioned above and I want to input utf-8 as well as read utf-8.

Simon Pio.
  • 115
  • 1
  • 14

1 Answers1

2

You need to set the console in- and output code pages to 65001 (UTF-8) before creating your PowerShell process, via the SetConsoleCP and SetConsoleOutputCP WinAPI functions, because the PowerShell CLI uses them to decode its stdin input and to encode its stdout output.

(By contrast, $OutputEncoding = [System.Console]::OutputEncoding = [System.Console]::InputEncoding = [System.Text.Encoding]::UTF8 only applies intra-PowerShell-session when making external-program calls from PowerShell.)

Note: If the calling process isn't itself a console application, you may have to allocate a console before calling SetConsoleCP and SetConsoleOutputCP, using the AllocConsole WinAPI function, but I'm frankly unclear on (a) whether that makes this console instantly visible (which may be undesired) and (b) whether the CreateProcess call then automatically uses this console.

It that doesn't work, you can call via cmd.exe and call chcp before calling powershell.exe, along the lines of cmd /c "chcp 65001 >NUL & powershell -c ..."; chcp 65001 sets the console code pages to 65001, i.e. UTF-8.

(This introduces extra overhead, but a cmd.exe process is relatively light-weight compared to a powershell.exe process, and so is chcp.com).

Here's a sample command you can run from PowerShell to demonstrate:

& {

  # Save the current code pages.
  $prevInCp, $prevOutCp = [Console]::InputEncoding, [Console]::OutputEncoding

  # Write the UTF-8 encoded form of string 'kö' to a temp. file.
  # Note: In PowerShell (Core) 7+, use -AsByteStream instead of -Encoding Byte
  Set-Content temp1.txt -Encoding Byte ([Text.UTF8Encoding]::new().GetBytes('kö'))

  # Switch to UTF-8, pipe the UTF-8 file's content to PowerShell's stdin,
  # verify that it was decoded correctly, and output it, again encoded as UTF-8.
  cmd /c 'chcp 65001 >NUL & type temp1.txt | powershell -nop -c "$stdinLine = @($input)[0]; $stdinLine -eq ''kö''; Write-Output $stdinLine" > temp2.txt'

  # Read the temporary file as UTF-8 and echo its content.
  Get-Content -Encoding Utf8 temp2.txt

  # Clean up.
  Remove-Item temp[12].txt
  # Restore the original code pages.
  [Console]::InputEncoding = $prevInCp; [Console]::OutputEncoding = $prevOutCp

}

This outputs the following, indicating that the powershell call both correctly read the UTF-8-encoded input and also output it as UTF-8:

True
ö

Note:

You can bypass character encoding problems by using the in-process PowerShell SDK as an alternative to creating a powershell.exe child process, though I don't know how painful that is from C++. For a C# example, see this answer.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • I tested it (putting the `SetConsoleOutputCP(CP_UTF8)` in the first line of the main) but still, the wrong characters are shown. Is there a specific place I have to put this line? – Simon Pio. Aug 30 '21 at 19:27
  • I tried the method with `AllocConsole()` but this doesnt change anything. Showing a console in the background is not what I want but better than nothing. I am not sure what you ment by calling via cmd? My GUI (made with GTK+ [gtkmm]) may call multiple powershell sub-processes. So, I can't just call one powershell session. Is there no known method to send utf-8 via pipe to powershell? – Simon Pio. Aug 30 '21 at 20:57
  • @SimonPio. You can bypass character encoding problems by using the in-process PowerShell SDK, though I don't know how painful that is from C++ - for a C# example, see [this answer](https://stackoverflow.com/a/68942672/45375). As for `AllocConsole()` - you may have to tweak the start-up info properties in the subsequent `CreateProcess` call, but I'm only guessing. As for `cmd.exe`: I meant that every time you need to call `powershell.exe`, don't call it directly, call it via `cmd.exe`, which allows you to execute chcp `65001` first, which sets the console code page(s) to UTF-8. – mklement0 Aug 30 '21 at 21:23
  • I tried it with a console only application and `SetConsoleOutputCP(CP_UTF8)`. This did not work either. I also tried to create a cmd process with `CreateProcess(...)` in this console application and passed a utf-8 string `std::string test("chcp 65001 & powershell -c mkdir C:\..\ägiüdjöfj\n"` with `WriteFile(...)`. Output was wrong again. Maybe I also have to fix the issue for passing utf-8 to cmd? Gonna be a encoding-ception... – Simon Pio. Aug 31 '21 at 16:12
  • @SimonPio., the output encoding may also be off if your PowerShell CLI call calls an external program that doesn't play by the rules. I can't tell from your code what PowerShell command you're trying to execute. – mklement0 Aug 31 '21 at 16:20
  • @SimonPio, additionally, if your PowerShell command relays stdin input via `$input` to a well-behaved external program, you should set `$OutputEncoding = [Text.UTF8Encoding]::new()` in your PowerShell command before calling the external program. – mklement0 Aug 31 '21 at 16:21
  • I will update my question later to be clear that even simple commands with utf8 characters won't work, not calling another program via powershell. E.g. I can open powershell as usual and the command `mkdir C:\....\köaj` works without problem. When I `CreateProcess()` in c++ to open a powershell sub-process and then pass the same command via `WriteFile()`, the created folder will have different characters instead of `ö` – Simon Pio. Aug 31 '21 at 16:30
  • My apologies, @SimonPio.: I had wrongly convinced myself that calling `SetConsoleOutputCP` would be sufficient, but it turns out `SetConsoleCP` must also be called (`chcp` calls both). Not sure if that solves your problem, but it's definitely a piece of the puzzle - please see my update, which includes a snippet of PowerShell code you can experiment with interactively. – mklement0 Sep 01 '21 at 18:26
  • This now works. I just have to find out how to hide the console. Thank you for your effort! – Simon Pio. Sep 02 '21 at 09:10
  • Glad to hear it, @SimonPio.; my pleasure. https://stackoverflow.com/questions/30965808/running-c-program-invisible is asking that question (having come up with an unsatisfactory solution of hiding the window once created, which results in flashing), but there's no answer as of this writing. https://github.com/rprichard/win32-console-docs provides background information that may provide pointers. – mklement0 Sep 02 '21 at 13:25