2

I am working on an ETL process (no primary key -> no transactional replication) between Oracle and SQL Server and using a MD5 hash to detect for differences between the source and the destination databases.

This works fine for those records where the data falls into the first 127 ASCII characters. But when there are any 'extended ascii'* characters such as ½, °, or © SQL Server's HASHBYTES function hashes these characters in a non-standard way (i.e. different from Oracle's DBMS_CRYPTO.Hash, the .Net Cryptographic library etc).

So when I run this in Oracle:

select rawtohex(
DBMS_CRYPTO.Hash (
    UTL_I18N.STRING_TO_RAW ('°', 'AL32UTF8'),
    2)
) from dual;

I get: 4723EB5AA8B0CD28C7E09433839B8FAE.

And when I run this in SQL Server:

SELECT HASHBYTES('md5', '°');

I get: EC655B6DA8B9264A7C7C5E1A70642FA7

And when I run this C# code:

string password = "°";

// byte array representation of that string
byte[] encodedPassword = new UTF8Encoding().GetBytes(password);

// need MD5 to calculate the hash
byte[] hash = ((HashAlgorithm) CryptoConfig.CreateFromName("MD5")).ComputeHash(encodedPassword);

// string representation (similar to UNIX format)
string encoded = BitConverter.ToString(hash)
   // without dashes
   .Replace("-", string.Empty)
   // make lowercase
   .ToLower();

I get 4723EB5AA8B0CD28C7E09433839B8FAE i.e. the same as in Oracle and every online tool that I have used.

Is there any SQL-based solution to this problem or would I need to create a CLR stored procedure and hash the data there?


*I realise that the term is somewhat controversial

David Brower
  • 2,888
  • 2
  • 25
  • 31
  • 1
    MD5 is MD5, if the output differs, the input differs - if the inputs look the same then its an encoding issue related to the binary interpretation of the input. Can you show the input and output of such a HASHBYTES() call along with your expected result? – Alex K. Apr 29 '16 at 15:20
  • 1
    SQL Server is returning you the hash of the string as interpreted as ISO-8859-1, `HASHBYTES('md5', N'°');` would give you the UTF16 result if you can use that in Oracle? – Alex K. Apr 29 '16 at 15:37
  • 1
    Possible duplicate of [TSQL md5 hash different to C# .NET md5](http://stackoverflow.com/questions/27908449/tsql-md5-hash-different-to-c-sharp-net-md5) – Matthew Whited Apr 29 '16 at 16:29

2 Answers2

3

There is no UTF-8 support in MS SQL Server as of now. Because of this, your hashes will always differ until you switch the source string to the most common denominator, in this case UTF-16 (probably).

Roger Wolf
  • 7,307
  • 2
  • 24
  • 33
0

I have decided to around SQL Server's treatment of extended ASCII by implementing a CLR stored procedure that use the .Net cryptographic library:

using System;
using System.Security.Cryptography;
using System.Text;
using Microsoft.SqlServer.Server;

public class Functions
{
  [SqlFunction]
  public static string GetMD5Hash (string input)
  {
    var encodedPassword = new UTF8Encoding().GetBytes(input);

    var hash = ((HashAlgorithm)CryptoConfig.CreateFromName("MD5")).ComputeHash(encodedPassword);

    return BitConverter.ToString(hash).Replace("-", string.Empty);
  }
}
David Brower
  • 2,888
  • 2
  • 25
  • 31