File.ReadAllText vs Encoding.UTF8: some string (apparently), but not equal

Question

I'm writing some integration tests which should check for the reading of an existing text file. Initially, I was trying to check the results by comparing the results through strings. However, that doesn't seem to be working well. Here's an excerpt from my code:

var bytes = File.ReadAllBytes(Path.Combine(path, guid.ToString()));
Assert.Equal(bytes, docBytes); //true
Assert.Equal(File.ReadAllText(Path.Combine(path, guid.ToString())).Trim(), Encoding.UTF8.GetString(docBytes).Trim()); //false

Now, here's what the debguger window shows

I've thought about different encodings and BOM error related problems, but looking at the debugger, both strings seem to be equal (if there was some sort of encoding problem, then the characters should be different, right?). Any clues on what's going on?

Thanks.

Luis

`Encoding.UTF8.GetString(byteArrayWithPreamble)` interprets the BOM characters as actual characters. `File.ReadAllText()` uses the BOM to detect the encoding, then doesn't output them. Use a `StreamReader` to read the string from your byte arrays to prevent that, see duplicate. — CodeCaster, Jul 22 '16 at 12:01
Is it really the same problem? According to the post you've mentioned, I should get different strings and that doesn't happen here (if you look at the example, you'll notice ?abc vs abc...) — Luis Abreu, Jul 22 '16 at 13:17
Yes, they're different. The string returned by `Encoding.UTF8.GetString(docBytes)` is 8 characters long (while "Testing" is 7 characters), because it starts with the unprintable [U+FEFF](http://www.fileformat.info/info/unicode/char/feff/index.htm) character. See also http://ideone.com/Xmz5yu. — CodeCaster, Jul 22 '16 at 13:21

File.ReadAllText vs Encoding.UTF8: some string (apparently), but not equal

0 Answers0