2

I am reading a simple text file which contains single line using filestream class. But it seems filestream.read prepends some junk character in the beginning.

Below the code.

using (var _fs = File.Open(_idFilePath, FileMode.Open, FileAccess.ReadWrite, FileShare.Read))
{
     byte[] b = new byte[_fs.Length];
     UTF8Encoding temp = new UTF8Encoding(true);
     while (_fs.Read(b, 0, b.Length) > 0)
     {
         Console.WriteLine(temp.GetString(b));
         Console.WriteLine(ASCIIEncoding.ASCII.GetString(b));


     }
 }

for example: My data in text file is just "sample". But the above code returns

  "?sample" and
  "???sample"

Whats the reason?? is it start of the file indicator? is there a way to read only my actual content??

RameshVel
  • 64,778
  • 30
  • 169
  • 213
  • 4
    My data in text file is just "sample" <- Are you sure that there isn't the UTF8 BOM infront of it? Check it with a hex editor. – CodesInChaos Nov 13 '10 at 10:12
  • @CodeInChaos, thanks for the hint, in hex edit it looks "ef bb bf 73 61 6d 70 6c 65 00 and ...sample". why is that..... – RameshVel Nov 13 '10 at 10:15
  • If you need to strip the BOM, you're doing something wrong already. See http://stackoverflow.com/questions/1317700/strip-byte-order-mark-from-string-in-c. – Frédéric Hamidi Nov 13 '10 at 10:17

4 Answers4

2

Could be the BOM - a.k.a byte order mark.

chillitom
  • 24,888
  • 17
  • 83
  • 118
2

The byte order mark(BOM) consists of the Unicode char 0xFEFF and is used to mark a file with the encoding used for it.

So if you correctly decode the file as UTF8 you get that character as first char of your string. If you incorrectly decode it as ANSI you get 3 chars, since the UTF8 encoding of 0xFEFF is the byte sequence "EF BB BF" which is 3 bytes.

But your whole code can be replaced with

File.ReadAllText(fileName,Encoding.UTF8)

and that should remove the BOM too. Or you leave out the encoding parameter and let the function autodetect the encoding(for which it uses the BOM)

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
0

You are reading the BOM from the stream. If you are reading text, try using a StreamReader which will handle this automatically.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
0

Try instead

using (StreamReader sr = new StreamReader(File.Open(path),Encoding.UTF8))

It will definitely strip you the BOM

usr-local-ΕΨΗΕΛΩΝ
  • 26,101
  • 30
  • 154
  • 305