4

I am using SHA256 to get a hashed hexadecimal string. It works fine when using regular characters, however when a string to be hashed includes accents/diacritics, then I get a different result in C# and T-SQL. I would prefer to make the change on the SQL Server end.

  • Example word that does match: bird
  • Example word that does not match: MUÑOZ

C#

using (SHA256 sha2 = SHA256.Create())  
{
    var hash = sha2.ComputeHash(Encoding.UTF8.GetBytes(fullAddress));
    string hexString = string.Empty;

    for (int i = 0; i < hash.Length; i++)
    {
        hexString += hash[i].ToString("X2"); //Convert the byte to Hexadecimal representation, Notice that we use "X2" instead of "X"
    }

    sha2.Dispose();
    return hexString;
}

SQL

declare @fullAddress nvarchar(500)
set @fullAddress = 'MUÑOZ'
select CONVERT([varchar](256), HASHBYTES('SHA2_256', @fullAddress), 2) 
TheProgrammer
  • 1,314
  • 5
  • 22
  • 44
  • 1
    Use [`NVARCHAR`](https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2017). See [differences in varchar and nvarchar](https://dba.stackexchange.com/questions/36081/write-differences-between-varchar-and-nvarchar) – S3S Jul 11 '19 at 15:28
  • @scsimon I have tried to use nvarchar but I still get a different result. – TheProgrammer Jul 11 '19 at 15:30
  • 1
    By `nvarchar`, @scsimon means the variable, not the `CONVERT`. Notice that these 2 expression return completely different results, due to the datatype of the variable: [DB<>Fiddle](https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=3999dee02ff605391a118a6aab58e3a9). The type returned by the `CONVERT`, doesn't (really) matter here as a `varbinary` only returns the characters A-F & 0-9. – Thom A Jul 11 '19 at 15:32
  • 4
    Like so... `select HASHBYTES('SHA2_256', N'MUÑOZ')` will result in 453803.... vs A8118F6... – S3S Jul 11 '19 at 15:33
  • 2
    I suspect that the issue is in using UTF8 at the C# side. I believe SQL server uses UC2 as its default coding, so you'd need to convert your C# string to this before hashing. – Steve Todd Jul 11 '19 at 15:35
  • @SteveTodd, three online SHA256 viewers yielded the same hash as the original C# code in the post, so only the SQL code or configuration is wrong to me – Arthur Attout Jul 11 '19 at 15:37
  • 2
    The SQL code is generating a correct hash, but for different data. If you want the two to produce the same results then you need to hash the same data. Simple as that. – Steve Todd Jul 11 '19 at 15:38
  • 1
    @ArthurAttout the two snippets don't work on the same data. C# and SQL Server use *Unicode* (UTF16) and yet the OP uses *UTF8* to convert the string into bytes. – Panagiotis Kanavos Jul 11 '19 at 15:39
  • @scsimon See edits above, when generating in SQL I get: A8118F67CAF3AC1D2DEAAD1C9146603600E94387B7EBF2B63F16ED66F4A4A85E and when I generate in C# I get: 276DB000BF524070F106A2C413942159AB5EF2F5CA5A5B91AB2F3B6FA48EE1ED – TheProgrammer Jul 11 '19 at 15:40
  • 3
    @DeVonte .NET, Windows and SQL Server use UTF16, not UTF8. You're hashing different bytes. If you want both snippets to produce the same value, use `Encoding.Unicode` – Panagiotis Kanavos Jul 11 '19 at 15:40
  • @PanagiotisKanavos Thank you! By changing my C# code to Encoding.Unicode it worked – TheProgrammer Jul 11 '19 at 15:46

1 Answers1

5

.NET, Windows and SQL Server use UTF16, not UTF8. The two snippets are hashing different bytes. When the same encoding is used, the hash strings are identical.

This :

using (var sha2 = System.Security.Cryptography.SHA256.Create())  
{
    var hash = sha2.ComputeHash(Encoding.Unicode.GetBytes("MUÑOZ"));
    {
        string hexString = string.Empty;

        for (int i = 0; i < hash.Length; i++)
        {
            hexString += hash[i].ToString("X2");
        }
        Console.WriteLine(hexString);        
    }    
}

Produces :

276DB000BF524070F106A2C413942159AB5EF2F5CA5A5B91AB2F3B6FA48EE1ED

Which is the same as SQL Server's hash string

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236