How to read text files with ANSI encoding and non-English letters?

Question

I have a file that contains non-English chars and was saved in ANSI encoding using a non-English codepage. How can I read this file in C# and see the file content correctly?

Not working

StreamReader sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.ASCII);
var ags = sr.ReadToEnd();
sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.UTF8);
ags = sr.ReadToEnd();
sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.Unicode);
ags = sr.ReadToEnd();

Working but I need to know what is the code page in advance, which is not possible.

sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.GetEncoding(1252));
ags = sr.ReadToEnd();

score 74 · Accepted Answer · edited Nov 08 '21 at 01:37

74

 var text = File.ReadAllText(file, Encoding.GetEncoding(codePage));

List of codepages : https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers?redirectedfrom=MSDN

edited Nov 08 '21 at 01:37

spottedmahn

14,823
13
108
178

answered Aug 26 '12 at 13:03

L.B

114,136
19
178
224

2

I will need to know the code page. I don't know it in advance. – MichaelT Aug 26 '12 at 13:07
@MichaelT there are some open source libraries to *guess* the encoding, but it is not an easy process. – L.B Aug 26 '12 at 13:08
1

I saw that old MS notepad is handling this file with no problems and thinking I missing something. – MichaelT Aug 26 '12 at 13:11
5

@MichaelT [How can I detect the encoding/codepage of a text file](http://stackoverflow.com/questions/90838/how-can-i-detect-the-encoding-codepage-of-a-text-file) – L.B Aug 26 '12 at 13:23
5

Remember http://www.joelonsoftware.com/articles/Unicode.html - The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky – gimel Aug 27 '12 at 06:42
Notepad guesses the current locale's code page, which you can get from `Encoding.Default`. However an XML file in the locale code page without a specific `` encoding saying so is an outright error. – bobince Aug 27 '12 at 20:35
Please note that .NET Core only supports ASCII, ISO-8859-1 and Unicode encodings. So you will get an error when trying to use encoding 1252 (ANSI Latin 1; Western European Windows). What works for me is encoding 65000 (utf-7 Unicode). – Martijn Sep 17 '20 at 08:45

score 14 · Answer 2 · answered May 10 '16 at 17:17

14

You get the question-mark-diamond characters when your textfile uses high-ANSI encoding -- meaning it uses characters between 127 and 255. Those characters have the eighth (i.e. the most significant) bit set. When ASP.NET reads the textfile it assumes UTF-8 encoding, and that most significant bit has a special meaning.

You must force ASP.NET to interpret the textfile as high-ANSI encoding, by telling it the codepage is 1252:

String textFilePhysicalPath = System.Web.HttpContext.Current.Server.MapPath("~/textfiles/MyInputFile.txt");
String contents = File.ReadAllText(textFilePhysicalPath, System.Text.Encoding.GetEncoding(1252));
lblContents.Text = contents.Replace("\n", "<br />");  // change linebreaks to HTML

answered May 10 '16 at 17:17

Snizzle

141
1
2

2

Should be the accepted answer IMHO.. Furthermore with .NET core 2.x or .NET Standard you will get a new problem. Codepage need to be registered before .. see https://stackoverflow.com/questions/37870084/net-core-doesnt-know-about-windows-1252-how-to-fix – Philm Jul 27 '19 at 13:24
1

Please note that .NET Core only supports ASCII, ISO-8859-1 and Unicode encodings. So you will get an error when trying to use encoding 1252 (ANSI Latin 1; Western European Windows). What works for me is encoding 65000 (utf-7 Unicode). – Martijn Sep 17 '20 at 08:36

score 2 · Answer 3 · edited May 23 '17 at 10:29

2

If I remember correctly the XmlDocument.Load(string) method always assumes UTF-8, regardless of the XML encoding. You would have to create a StreamReader with the correct encoding and use that as the parameter.

xmlDoc.Load(new StreamReader(
                     File.Open("file.xml"), 
                     Encoding.GetEncoding("iso-8859-15")));

I just stumbled across KB308061 from Microsoft. There's an interesting passage: Specify the encoding declaration in the XML declaration section of the XML document. For example, the following declaration indicates that the document is in UTF-16 Unicode encoding format:

<?xml version="1.0" encoding="UTF-16"?>

Note that this declaration only specifies the encoding format of an XML document and does not modify or control the actual encoding format of the data.

Link Source:

XmlDocument.Load() method fails to decode € (euro)

edited May 23 '17 at 10:29

Community

1
1

answered Aug 26 '12 at 13:01

KF2

9,887
8
44
77

why not [*`File.ReadAllText`*](http://msdn.microsoft.com/en-us/library/ms143369.aspx)? – Adam Aug 26 '12 at 13:04
-@MichaelT can u give a screen shot of your result? – KF2 Aug 26 '12 at 13:06
-@MichaelT :try my new answer – KF2 Aug 26 '12 at 13:29
If the `` prolog in your XML file says UTF-8, and it's not a proper UTF-8 stream, then what you have got is not well-formed and thereby not XML. Really you need to fix whatever is producing the bogus XML files. – bobince Aug 27 '12 at 20:32

score 0 · Answer 4 · answered Apr 28 '21 at 18:59

0

In my case of c++/clr (WinForms) such approach had a success:

String^ str2 = File::ReadAllText("MyText_cyrillic.txt",System::Text::Encoding::GetEncoding(1251)); 
textBox1->Text = str2;

answered Apr 28 '21 at 18:59

Олександр Добржанський

21
2

score 0 · Answer 5 · answered Oct 11 '22 at 21:17

using (StreamReader file = new StreamReader(filePath, Encoding.GetEncoding("ISO-8859-1")))
{
JsonSerializer serializer = new JsonSerializer();
IList<Type> result= (IList<Type>)serializer.Deserialize(file, typeof(IList<Type>));
      
                }
    
ANSI Code : ISO-8859-1

score -1 · Answer 6 · answered Mar 16 '17 at 10:32

-1

using (StreamWriter writer = new StreamWriter(File.Open(@"E:\Sample.txt", FileMode.Append), Encoding.GetEncoding(1250)))  ////File.Create(path)
        {
            writer.Write("Sample Text");
        }

answered Mar 16 '17 at 10:32

sebastin jiffin a j

11

2

Little explenation with code helps more. Please explain what this code does. – Olcay Ertaş Mar 16 '17 at 10:55
I have to second what @OlcayErtaş said, especially given that there are several other high-quality answers to this. – EJoshuaS - Stand with Ukraine Mar 16 '17 at 14:45

How to read text files with ANSI encoding and non-English letters?

6 Answers6

Linked