1

I came across this earlier today that was not sure why it happens.

I have the following code that sets the internal position of the file stream to a location so I can read the number of lines from that position. It is similar to this other post but when I used stream.Seek I see strange results

StringBuilder b = new StringBuilder();
using(var stream = _streamFactory.CreateStream())
using (var streamReader = new System.IO.StreamReader(stream, _streamFactory.Encoding))
{
     stream.Seek(startPosition, System.IO.SeekOrigin.Begin);

     string value;
     for (int i = 0; i < lines; i++)
     {
         if ((value = streamReader.ReadLine()) != null)
         {
             b.AppendLine(value);
         }
      }
 }

Now what I am doing is reading a file using the UTF-8 encoding so I know there are extra bits at the start of the file that denote this but are not part of the text I want to extract.

Say for eample I have the following text in the file

Hello my name is bob

So if I set startPosition to 0 my results will be Hello my name is bob however when I set startPosition to 1 I dont get ello my name is bob but rather @@Hello my name is bob where @@ are 2 bytes from the encoding bits.

So my question is why when I set .Seek(0) and then do a ReadLine I get the correct line but Seek(1) will return the 2nd and 3rd bytes of the encoding?

Seek(3) will also yield the same results as Seek(0). If this was consistent I would have thought Seek(0) would return @@@Hello my name is bob

Also how do I know how many extra bytes are at the start of the file without reading it (but knowing the encoding)?

I tried looking at the disassembled code and had to stop before my brain went on strike.

Note: The Streambuilder in this case is just creating a FileStream. I do this so I can Unit test this code using a MemoryStream

Community
  • 1
  • 1
aqwert
  • 10,559
  • 2
  • 41
  • 61

1 Answers1

1

First two bytes represent the encoding of file. Take a look at this article.

KV Prajapati
  • 93,659
  • 19
  • 148
  • 186
  • Yes I realize that my question was why doesn't `Seek(0)` include those 2 bytes when doing a `ReadLine`? – aqwert Dec 14 '11 at 06:22