I'm using a finite-state machine to read a extra large file. It's not multi-threaded, so there won't be any problem of thread safety.
It contains 3 kinds of content:
- binary number, indicates the length of the following string, counts a character as 1
- ANSI, takes 1~2 Bytes for a character
- UTF-8, takes 1~4 Bytes for a character
I've found this question that might be useful, but it failed. The similiar python question is neither useful, because it won't throw any error. I have to read the content with proper encoding, or the behavior will go unknown.
Currently, i'm using StreamReader, but the CurrentEncoding property cannot be changed, once the StreamReader is initialized.
So i've also tried to recreate the StreamReader on the same Stream:
reader = new StreamReader(stream, encoding65001); //UTF-8
DoSomething(reader);
reader = new StreamReader(stream, encoding1252); //ANSI
DoSomething(reader);
reader = new StreamReader(stream, encoding936); //ANSI
//...
But it starts to read strange content from an unknown position. I haven't find out the possible cause for this strange behavior.
Have I made mistake on creating multiple StreamReader, or it is designed not to create multiple on the same stream?
If it is designed so, is there any solution for reading such file?
Thank you for the time reading.
Edit: I've run the following code on .NET Core 3.1:
Stream stream = File.OpenRead(testFilePath);
Console.WriteLine(stream.Position);
Console.WriteLine(stream.ReadByte());
Console.WriteLine(stream.Position + "\r\n");
StreamReader reader = new StreamReader(stream, Encoding.UTF8);
Console.WriteLine(reader.Read());
Console.WriteLine(stream.Position + "\r\n");
reader = new StreamReader(stream, CodePagesEncodingProvider.Instance.GetEncoding(1252));
Console.WriteLine(reader.Read());
Console.WriteLine(stream.Position);
With the example text of following:
abcdefg
And the output:
0
97
1
98
7
-1
7
It's strange and interesting.