18

I'm reading a file using:

var source = File.ReadAllText(path);

and the character © wasn't being loaded correctly.

Then, I changed it to:

var source = File.ReadAllText(path, Encoding.UTF8);

and nothing.

I decided to try using

var source = File.ReadAllText(path, Encoding.Default);

and it worked perfectly. Then I debugged it and tried to find which Encoding did the trick, and I found that it was UTF-7.

What I want to know is:

Is it recommended to use Encoding.Default, and can it guarantee all the characters of the file will be read without problems?

Oscar Mederos
  • 29,016
  • 22
  • 84
  • 124
  • I find it interesting that Encoding.Default would produce UTF7 and not one of the extended ascii encodings such as Windows-1251 or Windows-1252. Can anyone enlighten me? – Dan W Oct 10 '12 at 02:52

4 Answers4

9

Encoding.Default will only guarantee that all UTF-7 character sets will be read correctly (google for the whole set). On the other hand, if you try to read a file not encoded with UTF-8 in the UTF-8 mode, you'll get corrupted characters like you did.

For instance if the file is encoded UTF-16 and if you read it in UTF-16 mode, you'll be fine even if the file does not contain a single UTF-16 specific character. It all boils down to the file's encoding.

You'll need to do the save - reopen stuff with the same encoding to be safe from corruptions. Otherwise, try to use UTF-7 as much as you can since it is the most compact yet 'email safe' encoding possible, which is why it is default in most .NET framework setups.

Rick Sladkey
  • 33,988
  • 6
  • 71
  • 95
Teoman Soygul
  • 25,584
  • 6
  • 69
  • 80
  • But what if the file would have used `UTF-16` instead? Does it happens the same for all encodings? – Oscar Mederos May 15 '11 at 04:16
  • If it is UTF-16, you're only chance is to open it in UTF-16 mode but I'm sure that it will be down convertible to UTF-8 via stripping out non utf-8 compliant characters. – Teoman Soygul May 15 '11 at 04:26
  • @TeomanSoygul There is no such thing as "non utf-8 compliant characters"; any characters can be encoded with UTF8 or UTF16, and at the moment you're working with 'characters' the text is already decoded anyway. As for the bytes, you can't just determine it that simply; they both follow specific bit patterns. In the end, for converting them you'll just have to decode them as one and then encode as the other. – Nyerguds Mar 17 '16 at 08:20
9

It is not recommended to use Encoding.Default.

Quote from MSDN:

Different computers can use different encodings as the default, and the default encoding can even change on a single computer. Therefore, data streamed from one computer to another or even retrieved at different times on the same computer might be translated incorrectly. In addition, the encoding returned by the Default property uses best-fit fallback to map unsupported characters to characters supported by the code page. For these two reasons, using the default encoding is generally not recommended. To ensure that encoded bytes are decoded properly, your application should use a Unicode encoding, such as UTF8Encoding or UnicodeEncoding, with a preamble. Another option is to use a higher-level protocol to ensure that the same format is used for encoding and decoding.

Alex Aza
  • 76,499
  • 26
  • 155
  • 134
  • Have you ever tried to generate a file with a Unicode preamble from .NET? It involves messing about concatenating byte arrays with the preamble and the data. If you want to write UTF7 files you have to generate your own preamble because UTF7Encoding does not implement GetPreamble() - it falls back to Encoding.GetPreamble() which returns an empty array! – AlwaysLearning Sep 19 '13 at 06:53
  • UTF7's preamble is a giant mess; it includes the first 2 bits of the first character, somehow. I have no clue how I'd even start on decoding that... – Nyerguds Mar 17 '16 at 08:22
4

It sounds like you are interested in auto-detecting the encoding of a file, in some sort of situation where you are not in control of the encoding used to save it. There are several questions on StackOverflow addressing this; some cursory browsing points to Determine a string's encoding in C# as a pretty good one. My favorite answer is the one pointing to a C# port of Mozilla's universal charset detector.

Community
  • 1
  • 1
Domenic
  • 110,262
  • 41
  • 219
  • 271
-4

I think the ur file is in utf-7 encoding.nothing more. visit this page Your Answer

Saleh
  • 2,982
  • 5
  • 34
  • 59