Even though this question is similar to some other questions, those describe both problem and solution using Python. This question can still be valuable to other users, since you get a solution that works for C#. The root cause is the same: Windows consoles use code page 437 by default for stdout. However, especially beginners in C# and/or Python might not be able to figure out a solution for C# based on Python examples. Rewriting their entire C# application in Python just to get the problem fixed is less desirable, to say the least.
I can successfully convert an old C file 'file' in code page 1252 to UTF-8:
var file = @"..." // input file
var tmp = @"...\tmp.c" // output file
var lines = File.ReadAllLines(file, Encoding.GetEncoding(1252));
File.WriteAllLines(tmp, lines, Encoding.UTF8);
When I invoke the C pre-processor on tmp.c from the command line (Visual Studio|Tools|Command Line): cl /utf-8 /C /EP tmp.c > tmp.c.i2
I get a perfectly valid utf-8 file called 'tmp.c.i2'.
However, when I try to do this in C# code (below), it goes wrong for characters like '£' (pound sign) and '•' (bullet point). Output in 'tmp.c.i'
// call preprocessor
var proc = new Process
{
StartInfo =
{
//EnableRaisingEvents = true,
FileName = @"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\bin\Hostx86\x86\cl.exe",
WorkingDirectory = work,
Arguments = "/utf-8 /C /EP tmp.c",
//FI<file> force include
CreateNoWindow = true,
UseShellExecute = false,
RedirectStandardOutput = true,
RedirectStandardError = false
}
};
proc.Start();
string output = proc.StandardOutput.ReadToEnd();
proc.WaitForExit();
File.WriteAllText(Path.Combine(formatted, "tmp.c.i"), output, Encoding.UTF8);
According to Notepad++ with Hex editor plugin
- The pound sign '£' (c2 a3) becomes '┬ú' (e2 94 ac c3 ba)
- The bullet '•' (e2 80 a2 09) becomes 'ÔÇó' (c3 94 c3 87 c3 b3)
How can I fix this?