2

I am trying to read all lines from the file, however I am getting some unexpected results, code:

var readLines = File.ReadLines(file);

foreach (var line in readLines)
{
    //line = "T\0e\0s\0t\0"
}

File contents:

Test

If I will do line.Replace("\0", "") then it works fine however I would like to understand why this is happening and how I can get correct value from the file using ReadLines?

adjan
  • 13,371
  • 2
  • 31
  • 48
Vladimirs
  • 8,232
  • 4
  • 43
  • 79

1 Answers1

4

Your file seems to be encoded in UTF-16. Specify the encoding in the second parameter to ReadLines()

var readLines = File.ReadLines(file, Encoding.Unicode);

File.ReadLines() without the second parameter assumes UTF-8 as the encoding of the file. UTF-16 files use two bytes to encode a character (latin characters use the first one in UTF-16, and only one byte in UTF-8). So to UTF-8, in your text every other character is \0.

adjan
  • 13,371
  • 2
  • 31
  • 48