Auto encoding detect in C#

Question

Possible Duplicate:
Determine a string's encoding in C#

Many text editorsr (like Notepad++) can detect encoding of arbitrary file. Can I detect encodoing of file in C#?

Have you searched the web for examples of encoding detection in c#? — Matt Ellen, Sep 19 '10 at 16:39
duplicate of http://stackoverflow.com/questions/1025332/determine-a-strings-encoding-in-c — Matt Ellen, Sep 19 '10 at 16:40

Darin Dimitrov · Accepted Answer · 2010-09-19T16:49:02.540

9

A StreamReader will try to automatically detect the encoding of a file if there's a BOM when trying to read:

public class Program
{
    static void Main(string[] args)
    {
        using (var reader = new StreamReader("foo.txt"))
        {
            // Make sure you read from the file or it won't be able
            // to guess the encoding
            var file = reader.ReadToEnd();
            Console.WriteLine(reader.CurrentEncoding);
        }
    }
}

edited Sep 19 '10 at 16:49

answered Sep 19 '10 at 16:41

Darin Dimitrov

1,023,142
271
3,287
2,928

3

+1, though its worth adding that this is not foolproof; many encodings "look" the same to the simple detection method used. Even the best (which is used by the likes of google that can afford to do a lot of crunching and has lots of data to compare streams with) that will consider different possible meanings of "high" octets, aren't 100% perfect. If at all possible, it's best to convey this information precisely. – Jon Hanna Sep 20 '10 at 08:57
It works for common encodings, but not for all encodings. – Tyler Liu Sep 28 '11 at 07:56
This won't work for detecting UTF 16 without the BOM. Nor will it fall back to the user's local default codepage if it fails to detect any unicode encoding. You can fix the latter, but then it won't detect UTF8 without the BOM. – Dan W Oct 12 '12 at 00:57
8

`StreamReader` does *NOT* attempt to detect the encoding, it simply uses the default. See the very documentation you linked, where it says: "The default character encoding and default buffer size are used." – Mark Dec 03 '12 at 17:30
2

The [MSDN documentation](https://msdn.microsoft.com/en-us/library/hh399669.aspx) does say that the default character encoding will be used, but I've tried passing different BOMs to a StreamReader, and it correctly identified them (i.e. reader.CurrentEncoding returned the expected encoding). I tested with UTF-8, UTF-16-BE and UTF-16LE. Note @Darin's comment though - it won't work if you don't read some data. – Giles Mar 03 '15 at 14:55
1

reader.Peek() is enough – tomexou Sep 09 '15 at 09:09

Auto encoding detect in C#

1 Answers1

Linked