1

MyFile.txt contains (in Hex):

40 D8 40

I print all of its characters by:

using (StreamReader sr = new StreamReader("MyFile.txt"))
{
    while (!sr.EndOfStream)
    {
         int n = sr.Read();
         Console.WriteLine("{0:X}", n);
    }
}

The output is:

40
FFFD
40

But, when MyFile.txt contains the same chars while D8 is the last one:

40 40 D8

The output is only:

40
40

Where is the last char D8 (FFFD)??

Roi Bar
  • 105
  • 11

1 Answers1

0

StreamReader is expecting the input to be utf8. Unless you specify a different encoding, or use a binary FileStream instead, you will get invalid code points represented as 0xFF sequences in the output.

In your actual case, 0xD8 is a valid Unicode prefix aka "plane selector", but 0x40 is not a valid continuation byte to form a codepoint with: http://www.fileformat.info/info/unicode/char/d840/index.htm

U+D840 is not a valid unicode character.

Related, interesting read - in your second situation, the last char is missing because a prefix alone does not resolve into a character. The framework expects a continuation byte to determine a valid codepoint, but encounters end-of-stream and outputs nothing.

Cee McSharpface
  • 8,493
  • 3
  • 36
  • 77