2

How I can convert cp1252 string to utf-8 string in c#? I tried this code, but it doesn't work:

Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.GetEncoding(1251);
byte[] wind1252Bytes = ReadFile(myString1252);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string myStringUtf8 = Encoding.UTF8.GetString(utf8Bytes);

1 Answers1

2
var myGoodString = System.IO.File.ReadAllText(
    @"C:\path\to\file.txt",
    Encoding.GetEncoding("Windows-1252")
    );

A .NET/CLR string in memory cannot be UTF-8. It is just Unicode, or UTF-16 if you like.

The above code will properly read a text file in CP1252 into a .NET string.

If you insist on going through a byte[] wind1252Bytes, it is simply:

var myGoodString = Encoding.GetEncoding("Windows-1252").GetString(wind1252Bytes);

Since this answer was written, new versions of the framework .NET have appeared which do not by default recognize all the old (legacy) Windows-specific code pages. If Encoding.GetEncoding("Windows-1252") throws an exception with your runtime version, try registrering an additional provider with

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

(may need additional assembly reference to System.Text.Encoding.CodePages.dll) before you use Encoding.GetEncoding("Windows-1252").

See CodePagesEncodingProvider class documentation.

Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
  • On my machine, Encoding.GetEncoding("Windows-1252") throws: System.ArgumentException: 'Windows-1252' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method. (Parameter 'name') – Froggy Nov 05 '20 at 14:07
  • 1
    Ah, I have found that .NET Core does not support Windows-1252: https://learn.microsoft.com/en-us/dotnet/api/system.text.encoding?view=netcore-3.1 – Froggy Nov 05 '20 at 14:11
  • @Froggy On a system where the Windows-specific "code page 1252" is not available, a related (but not quite identical) encoding is the ISO "Latin 1" that you can get with `Encoding.GetEncoding("iso-8859-1")`. – Jeppe Stig Nielsen Nov 09 '20 at 16:38
  • @Froggy That's pretty lame from .NET Core. Glad I still have Win7 on one machine (Framework), our shops use Sharp cashiers that produce reports in old encoding. Guess I'm not upgrading windows anytime soon, or else we have to buy new cashiers using UTF8 or alike, which would be way more costy. – Karl Stephen Oct 08 '21 at 10:16
  • You can still get the right `Encoding` by installing the `System.Text.Encoding.CodePages` nuget as outlined here: https://stackoverflow.com/a/37870346/331281 – Dejan Oct 25 '21 at 11:51
  • @Dejan Good point. I have added to the answer. – Jeppe Stig Nielsen Oct 25 '21 at 19:38
  • 1
    @Froggy See above, it is possible to register a new encoding provider. – Jeppe Stig Nielsen Oct 25 '21 at 19:40
  • 1
    @KarlStephen See new info above. – Jeppe Stig Nielsen Oct 25 '21 at 19:40