2

I have a strange situation figured out, when the Hashbyte function of SQL Server is not outputting the correct result when converting it to SHA2_256 with a string containing umlauts (ä,ö,ü,ß).

I run the example code in SQL Server:

 declare @cryptString varchar(50) 
 set @cryptString = 'test'

 select convert(Varchar(64), Hashbytes('SHA2_256', @cryptstring), 2)

The result is:

9F86D081884C7D659A2FEAA0C55AD015A3BF4F1B2B0B822CD15D6C15B0F00A08

When I'm checking the SHA256 conversion on https://hashgenerator.de/ the result is the same.

My problem: when I'm trying to crypt for example 'müller', the result in SQL Server is:

26A45113433596C5DD53643C7652381202E8009E532A280E513D887174A9ED14

When I'm checking the SHA256 conversion on https://hashgenerator.de/ the result is different.

2dbd218072117713f2d5996a726a9b216ed791ffd0783b6ba4ab6d61b8333192

I think it could be a encoding problem, but I searched for hours and can't find any clue to solve this problem.

I appreciate any kind of help to solve this problem.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Rockstaa
  • 23
  • 3

1 Answers1

1

You have this:

declare @cryptString varchar(50) 

And you try to use it to hold this value:

müller

That's bad. You need an nvarchar for anything that could go beyond basic ascii characters.

But that's just for starters. nvarchar uses UTF-16 (see the section headed "Supplementary Characters" about halfway down the page). The web site may be using UTF-32 or (likely) UTF-8 to encode those characters. Either one will use slightly different byte representations, which will produce completely different hash values.

I believe you're seeing UTF-8 at https://hashgenerator.de/, because UTF-8 matches ASCII when only using ASCII characters. With UTF-8, simple values like test would then produce the same result for both the web site and the database.

To fix this, understand the SQL hash will use either ASCII or UTF-16, and so you must change your encoding on whatever other platform you're using to match the database. The simplest option is probably to always use UTF-16 for these values, but you could also choose to stick with varchar on the database and convert your text to ascii before calculating the hash elsewhere (with the understanding that you'll lose some fidelity).

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • I did exactly the same [Demo](http://rextester.com/BJYFM89518). Still result differs. – Lukasz Szozda Dec 01 '17 at 20:09
  • 1
    @lad2025 There's more in the answer now. – Joel Coehoorn Dec 01 '17 at 20:18
  • @lad2025 thank you for your detailed explanation, I guessed it could be this sort of problem. In my normal application flow the frontend will Hash the String to SHA256 and saving it to the db and in the next steps I'm only comparing those Crypted Strings. Momentarily I'm only using this method to migrate old data to a new one and have to crypt them. I will figure it out on monday when I'm working on it. – Rockstaa Dec 02 '17 at 14:38
  • Fixed the problem with the knowledge of @lad2025 and the correct answer of this stackoverflow article [MD5-Hash-of-a-utf8-string](https://stackoverflow.com/questions/35200452/compute-md5-hash-of-a-utf8-string). Using this TableFunction before to hash the string did the trick for me. I think it's obvious what the function is doing. – Rockstaa Dec 04 '17 at 09:18