0

Even though this question is similar to some other questions, those describe both problem and solution using Python. This question can still be valuable to other users, since you get a solution that works for C#. The root cause is the same: Windows consoles use code page 437 by default for stdout. However, especially beginners in C# and/or Python might not be able to figure out a solution for C# based on Python examples. Rewriting their entire C# application in Python just to get the problem fixed is less desirable, to say the least.

I can successfully convert an old C file 'file' in code page 1252 to UTF-8:

var file = @"..."           // input file
var tmp  = @"...\tmp.c"     // output file
var lines = File.ReadAllLines(file, Encoding.GetEncoding(1252));
File.WriteAllLines(tmp, lines, Encoding.UTF8);

When I invoke the C pre-processor on tmp.c from the command line (Visual Studio|Tools|Command Line): cl /utf-8 /C /EP tmp.c > tmp.c.i2 I get a perfectly valid utf-8 file called 'tmp.c.i2'.

However, when I try to do this in C# code (below), it goes wrong for characters like '£' (pound sign) and '•' (bullet point). Output in 'tmp.c.i'

// call preprocessor

var proc = new Process
{
    StartInfo =
    {
        //EnableRaisingEvents = true,
        FileName = @"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\bin\Hostx86\x86\cl.exe",
        WorkingDirectory = work,
        Arguments = "/utf-8 /C /EP tmp.c",
        //FI<file> force include
        CreateNoWindow = true,
        UseShellExecute = false,
        RedirectStandardOutput = true,
        RedirectStandardError = false
    }
};
proc.Start();
string output = proc.StandardOutput.ReadToEnd();
proc.WaitForExit();
File.WriteAllText(Path.Combine(formatted, "tmp.c.i"), output, Encoding.UTF8);

According to Notepad++ with Hex editor plugin

  • The pound sign '£' (c2 a3) becomes '┬ú' (e2 94 ac c3 ba)
  • The bullet '•' (e2 80 a2 09) becomes 'ÔÇó' (c3 94 c3 87 c3 b3)

How can I fix this?

user2943111
  • 421
  • 1
  • 5
  • 15

1 Answers1

0

Thanks to a private message from an anonymous community member, I was able to solve it. I added StandardOutputEncoding = Encoding.UTF8 to StartInfo. The default encoding for stdout is code page 437.

user2943111
  • 421
  • 1
  • 5
  • 15