1

I have got a GUI program which is a "wrapper" to cmd.exe. From this GUI program I can send and receive command via cmd.exe

I am using pipe redirection, and already read various references:

Creating a Child Process with Redirected Input and Output

Redirect Input and Output of Powershell.exe to Pipes in C++

windows cmd pipe not unicode even with /U switch

First of all I launch an instance of "cmd.exe /U", so that I am telling cmd.exe to generate Unicode output then I use ReadFile/WriteFile to read and write from/to pipe.

Everything works fine if I use ANSI, But I have got 2 problems related to Unicode:

1) If I want to pass data to the pipe using WriteFile, I have to convert it from Unicode to Ansi first. Passing data in Unicode does not work: specifically, when reading the output after my WriteFile, the cmd outputs a "More?" string. If I write the input in ANSI, it works fine and cmd correctly outputs the result of the command. The cmd has been launched with /U switch.

2) While the console output is successfully unicode for internal commands such as cd, dir, etc. when I launch from the cmd an external program such as ping, netstat, ipconfig etc. the output I receive is in ANSI, so I get corrupted data. I suppose that the /U switch does not have effect when external programs are running through cmd? Is there a possible solution to this, and make everything in the cmd gets output in Unicode?

Here is a sample of my code, I placed comments on lines where the errors are:

bool StartCmdPipe()
{
    static SECURITY_ATTRIBUTES SA;
    static STARTUPINFOW SI;
    static PROCESS_INFORMATION PI;
    static HANDLE StdInPipeRead;
    static HANDLE StdInPipeWrite;
    static HANDLE StdOutPipeRead;
    static HANDLE StdOutPipeWrite;
    static wstring sWorkDir;
    WCHAR* pBuffer;
    unsigned long BytesRead;

    sWorkDir = _wgetenv(L"SystemDrive");
    sWorkDir += L"\\";
    bJustOpened = true;

    SA.nLength = sizeof(SA);
    SA.bInheritHandle = true;
    SA.lpSecurityDescriptor = NULL;

    if (!CreatePipe(&StdInPipeRead, &StdInPipeWrite, &SA, 0) || !CreatePipe(&StdOutPipeRead, &StdOutPipeWrite, &SA, 0))
    {
        return false;
    }

    // Set Pipe for process to create
    ZeroMemory(&SI, sizeof(SI));
    SI.cb = sizeof(SI);
    SI.dwFlags = STARTF_USESHOWWINDOW | STARTF_USESTDHANDLES;
    SI.wShowWindow = SW_HIDE;
    SI.hStdInput = StdInPipeRead; //GetStdHandle(STD_INPUT_HANDLE);
    SI.hStdOutput = StdOutPipeWrite;
    SI.hStdError = StdOutPipeWrite;

    // Launch cmd process
    g_bCreateProcessSuccess = CreateProcessW(
        NULL,
        L"cmd.exe /U",
        NULL,
        NULL,
        true,
        0,
        NULL,
        (WCHAR*)sWorkDir.c_str(),
        &SI,
        &PI);

    g_sCommandLineInput = L"";
    g_bkeepCmdRunning = true;



    if (g_bCreateProcessSuccess)
    {
        do
        {
            DWORD TotalBytesAvail = 0;
            PeekNamedPipe(StdOutPipeRead, 0, 0, 0, &TotalBytesAvail, 0);

            pBuffer = (WCHAR*)malloc(TotalBytesAvail);

            if (TotalBytesAvail > 0) 
            {
                // First problem is: while internal commands work fine (dir, cd, etc.) and return output in unicode format, 
                // when I launch another process (ipconfig, netstat, ping, etc.), output is returned in ANSI format.
                // Even if I launched cmd.exe with /U switch
                ReadFile(StdOutPipeRead, pBuffer, TotalBytesAvail, &BytesRead, NULL);
            }
            else BytesRead = 0;

            if (BytesRead > 0)
            {
                wprintf(pBuffer);
            }

            free(pBuffer);

            // g_sCommandLineInput is a global wstring variable which gets filled each time user wants to send new command.
            if (g_sCommandLineInput.length() > 0)
            {
                DWORD numberofbyteswritten = 0;
                g_sCommandLineInput += L"\n"; //Append /n to make the cmd process interpret the data as a command to launch

                // Second problem is, why do I have to send command in ANSI format, even if I launched cmd with the /U switch?
                string sCommndLineAnsi = WStringToString(g_sCommandLineInput);
                WriteFile(StdInPipeWrite, sCommndLineAnsi.c_str(), sCommndLineAnsi.length() * sizeof(CHAR), &numberofbyteswritten, NULL);

                //Reset command
                g_sCommandLineInput = L"";
            }

            Sleep(100);
        }
        while (g_bkeepCmdRunning);

        TerminateProcess(PI.hProcess, 0);

        CloseHandle(PI.hThread);
        CloseHandle(PI.hProcess);
    }

    CloseHandle(StdOutPipeRead);
    CloseHandle(StdOutPipeWrite);

    return true;
}
Flavio
  • 451
  • 3
  • 26
  • you need convert not/from ansi(`CP_ACP`) but `CP_OEMCP`. cmd use exactly `CP_OEMCP` for convert. you need use it too from self side – RbMm Mar 23 '18 at 17:18
  • @RbMm, CMD doesn't use OEM. It uses the console's current output codepage (i.e.`GetConsoleOutputCP()`), line by line, i.e. you can change the codepage between parsed commands when reading from a pipe or in a batch file. – Eryk Sun Mar 23 '18 at 17:19
  • @eryksun - but by default `GetConsoleOutputCP()` is `CP_OEMCP`. and we can change it in self process. but in cmd, this is already another story – RbMm Mar 23 '18 at 17:20
  • @RbMm, yes, but the OP can change it to `CP_UTF8` to at least allow CMD to read Unicode input -- just not UTF-16 input. But only do this temporarily. Leaving the console permanently set to UTF-8 input is very buggy. – Eryk Sun Mar 23 '18 at 17:21
  • @eryksun - how change it in cmd, if we not use share console for example ? parent process can not have console at all – RbMm Mar 23 '18 at 17:23
  • 1
    @RbMm, if the parent has no console, and doesn't want one to show up, it can run CMD with the `CREATE_NO_WINDOW` flag and then call `AttachConsole` and `SetConsoleOutputCP`. – Eryk Sun Mar 23 '18 at 17:25
  • 1
    For external programs, the big mistake here is thinking that console applications that inherit a console handle and standard handles from CMD as the parent process are running "in" CMD. No. Each application uses the console and standard I/O however it was designed to, and unless they have environment variables and command-line switches to change this, you're stuck with whatever they do by default. It's up to you to reverse engineer what encoding they use. Typically the default is either ANSI, OEM, or the console's current output or input codepage (even for a pipe). – Eryk Sun Mar 23 '18 at 17:27
  • So if I understand correctly, I shouldn't need to set the code page, since the output for internal commands is already set in unicode, using the /U switch when I created the cmd process. For external commands (such as ping, ipconfig etc.) I will have to handle them separately, so for example I know that the ping command will produce ANSI output, and act by consequence. – Flavio Mar 23 '18 at 17:40
  • The problem is still in the user input for internal commands though, since I have to convert user input to ANSI. I suppose that 'SetConsoleOutputCP' does not affect how the cmd process interprets the data that I pass through pipe via 'WriteFile' – Flavio Mar 23 '18 at 17:41
  • For just CMD, attaching to the console and setting the output codepage to UTF-8 should suffice. Use UTF-8 for input and output. External programs are impossible to generalize. For example wmic.exe *always* writes OEM to both the console and pipes, but it writes UTF-16 to a disk file. Some programs also only work correctly if stdin or stdout is a console, or they may bypass standard I/O to open "CONIN$" and "CONOUT$" directly if attached to a console. – Eryk Sun Mar 23 '18 at 17:46
  • You're basically trying to replace the console itself, and doing this with pipes is going to be hit or miss. For example, many programs detect that they're writing to a pipe and use full buffering. You'll only see output when the buffer fills up and flushes (typically 4 KB). You may have better luck adapting something like [winpty](https://github.com/rprichard/winpty) in general. This let's them have a console and provides an interface for you to write to its input and read the screen buffer as if you were a pty master. – Eryk Sun Mar 23 '18 at 17:49
  • use different encoding for internal cmd commands and external programs, run by cmd - very uncomfortable from my opinion. especially when you use pipe as a stream of bytes. i be however use default `CP_OEMCP` for all. and main - i be use only single piper pair and asynchronous parent pipe – RbMm Mar 23 '18 at 17:53
  • @RbMm, using OEM always will lead to data corruption from mojibake (e.g. Python uses ANSI), plus you have the problem of programs switching to full buffering for a pipe (CRT default behavior), or requiring a console (e.g. timeout.exe fails without a console, and runas.exe will only read a password from the console), or implicitly using "CONIN$" or "CONOUT$" instead of standard I/O. The pipe approach is an endless headache. Microsoft should have added real pty support ages ago (basically let an application play the role of conhost.exe), but all they cared about was GUI apps for a long time. – Eryk Sun Mar 23 '18 at 18:07
  • @eryksun Following your suggestions, right after CreateProcess now I call AttachConsole(PI.dwProcessId), and then SetConsoleCP(CP_UTF8) and SetConsoleOutputCP(CP_UTF8). After that I write data in pipe using WriteFile, but still same problem. cmd.exe outputs "More?" when I send the command in Unicode. Maybe because the data I am passing is a C++ wstring, which is UTF16? – Flavio Mar 23 '18 at 18:13
  • @eryksun - in any case we can have problems. about mojibakei not think. anyway i think use different encoding in common stream not the best solution - need somehow divide what is from cmd, what is from external program. so need use common encoding for all – RbMm Mar 23 '18 at 18:14
  • 1
    @Flavio, if stdin and stdout aren't console handles (i.e. they're pipes or disk files), CMD reads from and writes to them using the current console output codepage, or ANSI (not OEM) if not attached to a console. You can override the output to UTF-16 with `/U`, but not the input. The console's codepage defaults to OEM, unless you configure a different default codepage in the registry. You're explicitly setting the codepage to UTF-8. So read and write UTF-8 bytes, not UTF-16 wide-character strings. Don't use the `/U` option. – Eryk Sun Mar 23 '18 at 18:20

0 Answers0