I am working on an ETL process (no primary key -> no transactional replication) between Oracle and SQL Server and using a MD5 hash to detect for differences between the source and the destination databases.
This works fine for those records where the data falls into the first 127 ASCII characters. But when there are any 'extended ascii'* characters such as ½
, °
, or ©
SQL Server's HASHBYTES
function hashes these characters in a non-standard way (i.e. different from Oracle's DBMS_CRYPTO.Hash
, the .Net Cryptographic library etc).
So when I run this in Oracle:
select rawtohex(
DBMS_CRYPTO.Hash (
UTL_I18N.STRING_TO_RAW ('°', 'AL32UTF8'),
2)
) from dual;
I get: 4723EB5AA8B0CD28C7E09433839B8FAE
.
And when I run this in SQL Server:
SELECT HASHBYTES('md5', '°');
I get: EC655B6DA8B9264A7C7C5E1A70642FA7
And when I run this C# code:
string password = "°";
// byte array representation of that string
byte[] encodedPassword = new UTF8Encoding().GetBytes(password);
// need MD5 to calculate the hash
byte[] hash = ((HashAlgorithm) CryptoConfig.CreateFromName("MD5")).ComputeHash(encodedPassword);
// string representation (similar to UNIX format)
string encoded = BitConverter.ToString(hash)
// without dashes
.Replace("-", string.Empty)
// make lowercase
.ToLower();
I get 4723EB5AA8B0CD28C7E09433839B8FAE
i.e. the same as in Oracle and every online tool that I have used.
Is there any SQL-based solution to this problem or would I need to create a CLR stored procedure and hash the data there?
*I realise that the term is somewhat controversial