5

I can't make it work so I get UTF-8 output from CreateProcess() into wstring.

Currently I am running this method to do that but without UTF-8 output:

HANDLE g_hChildStd_OUT_Rd = NULL;
HANDLE g_hChildStd_OUT_Wr = NULL;
HANDLE g_hChildStd_ERR_Rd = NULL;
HANDLE g_hChildStd_ERR_Wr = NULL;

PROCESS_INFORMATION CreateChildProcess(void);
void ReadFromPipe(PROCESS_INFORMATION);

string run(char *command){
    SECURITY_ATTRIBUTES sa;
    sa.nLength = sizeof(SECURITY_ATTRIBUTES);
    sa.bInheritHandle = TRUE;
    sa.lpSecurityDescriptor = NULL;
    if ( ! CreatePipe(&g_hChildStd_ERR_Rd, &g_hChildStd_ERR_Wr, &sa, 0) ) {
        exit(1);
    }
    if ( ! SetHandleInformation(g_hChildStd_ERR_Rd, HANDLE_FLAG_INHERIT, 0) ){
        exit(1);
    }
    if ( ! CreatePipe(&g_hChildStd_OUT_Rd, &g_hChildStd_OUT_Wr, &sa, 0) ) {
        exit(1);
    }
    if ( ! SetHandleInformation(g_hChildStd_OUT_Rd, HANDLE_FLAG_INHERIT, 0) ){
        exit(1);
    }
    char *szCmdline=command;
    PROCESS_INFORMATION piProcInfo;
    STARTUPINFO siStartInfo;
    bool bSuccess = FALSE;
    ZeroMemory( &piProcInfo, sizeof(PROCESS_INFORMATION) );
    ZeroMemory( &siStartInfo, sizeof(STARTUPINFO) );
    siStartInfo.cb = sizeof(STARTUPINFO);
    siStartInfo.hStdError = g_hChildStd_ERR_Wr;
    siStartInfo.hStdOutput = g_hChildStd_OUT_Wr;
    siStartInfo.dwFlags |= STARTF_USESTDHANDLES;
    bSuccess = CreateProcess(NULL,
        szCmdline,     // command line
        NULL,          // process security attributes
        NULL,          // primary thread security attributes
        TRUE,          // handles are inherited
        CREATE_NO_WINDOW,             // creation flags
        NULL,          // use parent's environment
        NULL,          // use parent's current directory
        &siStartInfo,  // STARTUPINFO pointer
        &piProcInfo);  // receives PROCESS_INFORMATION
    CloseHandle(g_hChildStd_ERR_Wr);
    CloseHandle(g_hChildStd_OUT_Wr);
    if ( ! bSuccess ) {

        exit(1);
    }
    DWORD dwRead;
    CHAR chBuf[BUFSIZE];
    bool bSuccess2 = FALSE;
    std::string out = "", err = "";
    for (;;) {
        bSuccess2=ReadFile( g_hChildStd_OUT_Rd, chBuf, BUFSIZE, &dwRead, NULL);
        if( ! bSuccess2 || dwRead == 0 ) break;

        std::string s(chBuf, dwRead);
        out += s;
    }
    dwRead = 0;
    for (;;) {
        bSuccess2=ReadFile( g_hChildStd_ERR_Rd, chBuf, BUFSIZE, &dwRead, NULL);
        if( ! bSuccess2 || dwRead == 0 ) break;

        std::string s(chBuf, dwRead);
        err += s;
    }

    return out;
}

I tried several things but did not succeed in making it working.

Any help is appreciated!

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
MrWhite
  • 179
  • 3
  • 11
  • 1
    Why do you expect the child process to be outputting UTF8 ? fyi std::wstring on Windows is usually used for UTF16. – Richard Critten Aug 17 '16 at 22:44
  • There are some characters like č,ć,ž that are printed when the command is executed using CreateProcess() so that's why I need it with wstring. – MrWhite Aug 17 '16 at 22:50
  • 1
    They are most likely MBCS on a code page you would need to determine. – Richard Critten Aug 17 '16 at 23:09
  • Pipes deal in raw bytes, not characters. What do the raw bytes actually look like in the output you are having trouble with? If you post the bytes here, and the string output you are expecting, someone can likely help identify the encoding being used. – Remy Lebeau Aug 18 '16 at 06:06
  • I got output like that: http://prntscr.com/c7982a , but it should be like that: http://prntscr.com/c7989y – MrWhite Aug 18 '16 at 11:20

1 Answers1

3

The output of a command is a byte stream. So you read it as a byte stream. It's up to the two programs to agree on the encoding to use.

For example:

  • If you execute a .NET (C#/VB.NET) console application, the application can use the Console.OutputEncoding, to set the encoding the Console.Write[Line] method will use.

    Console.OutputEncoding = Text.Encoding.UTF8;
    
  • Similarly, a PowerShell script can use the [Console]::OutputEncoding, to set the encoding the Write-Output or Write-Host cmdlets will use.

    [Console]::OutputEncoding = [Text.Encoding]::UTF8
    
  • The cmd.exe or a batch file can use the chcp command.

    chcp 65001
    
  • A Win32 application can use the SetConsoleOutputCP function to sets its output encoding, if it is using the WriteConsole. If the application is using the WriteFile, it just need to write the bytes encoded as desired already (e.g. using the WideCharToMultiByte).

    SetConsoleOutputCP(CP_UTF8);
    

When you then read the application output, you decode the byte stream using the agreed encoding. E.g. using the MultiByteToWideChar function.

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
  • I think the last bullet should refer to `WriteConsoleA`. There seems little point in calling `SetConsolveCP` when you're calling `WriteConsoleW`. The `WriteFile` part is of correct; there's no such thing as `WriteFileA/W` because it writes binary data instead of text. – MSalters Oct 10 '16 at 15:35
  • 1
    @MSalters I do not think you are right. The using the `WriteConsoleA`, you are using the OEM encoding to specify the output; and using the `WriteConsoleW` you are using the UTF-16 LE encoding. And the system converts either to the default encoding. But using the `SetConsoleCP` you can override either, to use e.g. UTF-8. How would you make your application output UTF-8 with the `WriteConsoleW` alone? – Martin Prikryl Oct 10 '16 at 16:21