Why is Encoding.UTF8 used instead of Encoding.Unicode?
Because that's the encoding that most other application frameworks that have made a choice use for hashes. Outside the .NET world, UTF-16LE encoding (which is what the misnamed “Unicode” encoding actually is) is not necessarily a natural choice for string storage. If you use something other than UTF-8 you won't be able to interoperate with those hashes generated from other systems.
Crucially, UTF-8 is ASCII-compatible: for ASCII-only input data this will generate matching hashes to all the software out there that works with encoding-ignorant byte strings. That includes a lot of PHP webapps, Java apps that call naïve String.getBytes
and so on.
So using UTF-8 means you get full interop with everything modern that uses UTF-8 and partial interop with pretty much everything else. Using UTF-16 would give you hashes that didn't match anyone else's.
You can still do it if you are sure you will only ever use the hashes internally, but it doesn't really win you anything. Any savings you made from not encoding to UTF-8 would likely be negated by having to hash a longer input sequence, because for the most-likely-to-occur ASCII characters, UTF-8 is a much more efficient representation than UTF-16.