0

How do I read chars from other countries such as ß ä?

The following code reads all chars, including chars such as 0x0D.

StreamReader srFile = new StreamReader(gstPathFileName); 
char[] acBuf = null;
int iReadLength = 100;
while (srFile.Peek() >= 0) {
    acBuf = new char[iReadLength];
    srFile.Read(acBuf, 0, iReadLength);
    string s = new string(acBuf);
}

But it does not interpret correctly chars such as ß ä.


I don't know what coding the file uses. It is exported from code (into a .txt file) that was written 20 plus years ago from a C-Tree database.

The ß ä display fine with Notepad.

ttom
  • 985
  • 3
  • 12
  • 21
  • 2
    What encoding does the file you are reading use? – Darin Dimitrov Mar 01 '15 at 16:52
  • 3
    Replace all this with `s = File.ReadAllText(fileName)` or `s = File.ReadAllText(fileName, knownEncoding)` – H H Mar 01 '15 at 16:52
  • This uses UTF-8. If your file uses a legacy encoding it won't handle those characters correctly. The best solution is to not use legacy encodings anywhere, but if you can't do that you need to pass the correct encoding (e.g. `Encoding.Default`) as second parameter to `StreamReader` or `File.ReadAllText`. – CodesInChaos Mar 01 '15 at 16:55
  • 1
    http://stackoverflow.com/questions/3825390/effective-way-to-find-any-files-encoding – Daniel A. White Mar 01 '15 at 17:01
  • @CodesInChaos: I would advise against `Encoding.Default` since the program would not give consistent results if run under different environments. – Douglas Mar 01 '15 at 17:10
  • @Douglas Of course `Default` encoding sucks. Sometimes you can hardcode `windows-1252`, sometimes the system dependent legacy encoding has the best chance of success and sometimes you might even have to resort to asking the user. – CodesInChaos Mar 01 '15 at 17:30

1 Answers1

3

By default, the StreamReader constructor assumes the UTF-8 encoding (which is the de facto universal standard today). Since that's not decoding your file correctly, your characters (ß, ä) suggest that it's probably encoded using Windows-1252 (Western European):

var encoding = Encoding.GetEncoding("Windows-1252");
using (StreamReader srFile = new StreamReader(gstPathFileName, encoding))
{
    // ... 
}

A closely-related encoding is ISO/IEC 8859-1. If the above gives some unexpected results, use Encoding.GetEncoding("ISO-8859-1") instead.

Douglas
  • 53,759
  • 13
  • 140
  • 188