0

I'm writing some integration tests which should check for the reading of an existing text file. Initially, I was trying to check the results by comparing the results through strings. However, that doesn't seem to be working well. Here's an excerpt from my code:

var bytes = File.ReadAllBytes(Path.Combine(path, guid.ToString()));
Assert.Equal(bytes, docBytes); //true
Assert.Equal(File.ReadAllText(Path.Combine(path, guid.ToString())).Trim(), Encoding.UTF8.GetString(docBytes).Trim()); //false

Now, here's what the debguger window showsenter image description here

I've thought about different encodings and BOM error related problems, but looking at the debugger, both strings seem to be equal (if there was some sort of encoding problem, then the characters should be different, right?). Any clues on what's going on?

Thanks.

Luis

Community
  • 1
  • 1
Luis Abreu
  • 4,008
  • 9
  • 34
  • 63
  • `Encoding.UTF8.GetString(byteArrayWithPreamble)` interprets the BOM characters as actual characters. `File.ReadAllText()` uses the BOM to detect the encoding, then doesn't output them. Use a `StreamReader` to read the string from your byte arrays to prevent that, see duplicate. – CodeCaster Jul 22 '16 at 12:01
  • Is it really the same problem? According to the post you've mentioned, I should get different strings and that doesn't happen here (if you look at the example, you'll notice ?abc vs abc...) – Luis Abreu Jul 22 '16 at 13:17
  • Yes, they're different. The string returned by `Encoding.UTF8.GetString(docBytes)` is 8 characters long (while "Testing" is 7 characters), because it starts with the unprintable [U+FEFF](http://www.fileformat.info/info/unicode/char/feff/index.htm) character. See also http://ideone.com/Xmz5yu. – CodeCaster Jul 22 '16 at 13:21
  • Thanks! That's it... – Luis Abreu Jul 22 '16 at 13:55

0 Answers0