65

I'm using the code below to read a text file that contains foreign characters, the file is encoded ANSI and looks fine in notepad. The code below doesn't work, when the file values are read and shown in the datagrid the characters appear as squares, could there be another problem elsewhere?

StreamReader reader = new StreamReader(inputFilePath, System.Text.Encoding.ANSI);
using (reader = File.OpenText(inputFilePath))

Thanks

Update 1: I have tried all encodings found under System.Text.Encoding. and all fail to show the file correctly.

Update 2: I've changed the file encoding (resaved the file) to unicode and used System.Text.Encoding.Unicode and it worked just fine. So why did notepad read it correctly? And why didn't System.Text.Encoding.Unicode read the ANSI file?

Hakan Fıstık
  • 16,800
  • 14
  • 110
  • 131
  • Are you sure it's encoded in ANSI? Sometimes Notepad will attempt a "best guess" and use a different encoding than you expect. – Rex M Feb 26 '09 at 22:58
  • 1
    If notepad determines that a file isn't Unicode or UTF-8, it will fall back on the system locale (set in the control panel -> Region and Language area). – Amir Abiri Mar 07 '14 at 18:07

11 Answers11

134

You may also try the Default encoding, which uses the current system's ANSI codepage.

StreamReader reader = new StreamReader(inputFilePath, Encoding.Default, true)

When you try using the Notepad "Save As" menu with the original file, look at the encoding combo box. It will tell you which encoding notepad guessed is used by the file.

Also, if it is an ANSI file, the detectEncodingFromByteOrderMarks parameter will probably not help much.

Jérôme Laban
  • 5,224
  • 3
  • 20
  • 17
  • 2
    Using Default Encoding worked for me. I had the char (Â) that was being skipped by StreamReader and changing it to default Encoding allowed to to be read correctly. Thanks! – buzzzzjay Nov 18 '11 at 17:46
  • 2
    Encoding.Default worked for me... Spanish characters in ANSI format would be read and written like %^ and ? before using Encoding.Default. – BoilerBrad Nov 29 '11 at 12:19
  • Encoding.Default works for me too. Portuguese chars around here. – John Prado Jan 14 '14 at 18:07
  • I am using several different languages and the only solution was the `Encoding.Default, true` in the initialization of `StreamReader`. – CaptainBli Aug 20 '14 at 22:36
  • This answer helped me over six years later: http://stackoverflow.com/questions/30850387/why-does-this-code-to-replace-accented-chars-with-html-codes-fail-to-work/30851057?noredirect=1#comment49746386_30851057 – B. Clay Shannon-B. Crow Raven Jun 15 '15 at 17:52
  • BTW: If using Notepad++ (rather than Notepad), the current encoding can be found by clicking on "Encoding" menu. – ToolmakerSteve Dec 26 '18 at 21:02
  • I had problems reading CSV with accented chars (Portuguese ISO 88591-1). `Encoding.Default` didn't worked for me; but setting `True` on `detectEncodingFromByteOrderingMarks` (the one after `Encoding.Default`) did the trick. – Marcelo Scofano Diniz Sep 11 '20 at 20:23
  • This fixed the issue for me. Characters like "ä" or "®" would be shown as "�". Now it's correct. `Encoding.UTF8` did not work. – baltermia Dec 16 '20 at 13:10
30

I had the same problem and my solution was simple: instead of

Encoding.ASCII

use

Encoding.GetEncoding("iso-8859-1")

The answer was found here.

Edit: more solutions. This maybe more accurate one:

Encoding.GetEncoding(1252);

Also, in some cases this will work for you too if your OS default encoding matches file encoding:

Encoding.Default;
Community
  • 1
  • 1
serop
  • 1,138
  • 12
  • 13
  • 1
    My issue was using `StringBuilder` and output to `HttpResponseMessage`, and the accents were being replaced. This worked, (`result` is `HttpResponseMessage`) `result.Content = new StringContent(csv.ToString(), Encoding.GetEncoding("iso-8859-1"));` – Rob Scott Sep 21 '15 at 15:17
  • This was me too. For some reason `new StreamReader(memoryStream, Encoding.UTF8)` wasn't working but `new StreamReader(memoryStream,Encoding.GetEncoding("iso-8859-1"))` did! – Christopher Sep 24 '15 at 14:50
  • Fixed my problem with a ¾ (extended ascii, not unicode) that someone decided to use. – Loren Pechtel Nov 19 '15 at 21:49
24

Yes, it could be with the actual encoding of the file, probably unicode. Try UTF-8 as that is the most common form of unicode encoding. Otherwise if the file ASCII then standard ASCII encoding should work.

Quintin Robinson
  • 81,193
  • 14
  • 123
  • 132
10

Using Encoding.Unicode won't accurately decode an ANSI file in the same way that a JPEG decoder won't understand a GIF file.

I'm surprised that Encoding.Default didn't work for the ANSI file if it really was ANSI - if you ever find out exactly which code page Notepad was using, you could use Encoding.GetEncoding(int).

In general, where possible I'd recommend using UTF-8.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
7

Try a different encoding such as Encoding.UTF8. You can also try letting StreamReader find the encoding itself:

    StreamReader reader = new StreamReader(inputFilePath, System.Text.Encoding.UTF8, true)

Edit: Just saw your update. Try letting StreamReader do the guessing.

Jakob Christensen
  • 14,826
  • 2
  • 51
  • 81
5

For swedish Å Ä Ö the only solution form the ones above working was:

Encoding.GetEncoding("iso-8859-1")

Hopefully this will save someone time.

jagge123
  • 263
  • 1
  • 5
  • 15
2

File.OpenText() always uses an UTF-8 StreamReader implicitly. Create your own StreamReader instance instead and specify the desired encoding. like

using (StreamReader reader =  new StreamReader(@"C:\test.txt", Encoding.Default)
 {
 // ...
 }
Anonymous
  • 29
  • 1
1

I solved my problem of reading portuguese characters, changing the source file on notepad++.

enter image description here

C#

    var url = System.Web.HttpContext.Current.Server.MapPath(@"~/Content/data.json");
    string s = string.Empty;
    using (System.IO.StreamReader sr = new System.IO.StreamReader(url, System.Text.Encoding.UTF8,true))
    {
          s = sr.ReadToEnd();
    }
Luís Ponciano
  • 157
  • 1
  • 5
1

I'm also reading an exported file which contains french and German languages. I used Encoding.GetEncoding("iso-8859-1"), true which worked out without any challenges.

A. Lartey
  • 59
  • 2
0

for Arabic, I used Encoding.GetEncoding(1256). it is working good.

FelixSFD
  • 6,052
  • 10
  • 43
  • 117
0

I had a similar problem with ProcessStartInfo and the property StandardOutputEncoding. I set it for German language console output to code page 850. This way I could read the output like ausführen instead of ausf�hren.

Rainer
  • 803
  • 10
  • 8