Hashing more than 8000 bytes in SQL Server

Question

SQL Server's hashing function HASHBYTES has an input limit of 8000 bytes.

How do you hash larger strings?

score 9 · Answer 1 · answered Feb 17 '12 at 11:49

9

You could write a SQL CLR function:

[Microsoft.SqlServer.Server.SqlFunction]
public static SqlBinary BigHashBytes(SqlString algorithm, SqlString data)
{
    var algo = HashAlgorithm.Create(algorithm.Value);

    var bytes = Encoding.UTF8.GetBytes(data.Value);

    return new SqlBinary(algo.ComputeHash(bytes));
}

And then it can be called in SQL like this:

--these return the same value
select HASHBYTES('md5', 'test stuff')
select dbo.BigHashBytes('md5', 'test stuff')

The BigHashBytes is only necessary if the length would be over 8k.

answered Feb 17 '12 at 11:49

Paul Tyng

7,924
1
33
57

7

Careful with CLR SP parameters being silently truncated to 8000 bytes - I had to tag the parameter with `[SqlFacet(MaxSize = -1)]` otherwise bytes after the 8000th would simply be ignored! Took me a while to get that one! – randomdude Feb 06 '13 at 12:42
-1 for misleading readers (unintentionally, of course) into using UTF-8. SQL Server (and Windows in general) uses UTF-16. Hence this will appear to work for many people who are currently only using Code Points 0 - 127, or possibly up to 256. But using UTF-8 leads to problems such as this Question: [SQL CLR function based on .net ComputeHash is not working with Cyrrilic](http://stackoverflow.com/a/35273859/577765) – Solomon Rutzky Feb 08 '16 at 16:03
@randomdude Your experience was due to how older versions of SSDT generated the T-SQL wrapper objects. The default used to be to use `NVARCHAR(4000)` for `SqlString` and `NVARCHAR(MAX)` for `SqlChars`. But since around the time of Visual Studio 2013, the default was changed to use `NVARCHAR(MAX)` for both. Still, it is always better to be explicit and use `[SqlFacet()]` as you mention, but folks using the newer SSDT versions wouldn't run into that. Also, SQLCLR only support `NVARCHAR` so it would actually truncate at 4000 chars :-). – Solomon Rutzky Feb 08 '16 at 16:07

Paul Sasik · Accepted Answer · 2011-10-14T16:30:31.353

5

You could hash 8k (or 4k or 2k) chunks of the input and then either concatenate those hashes or hash them into a new hash value. This might get difficult though if you have to create a similar algorithm (in an external .NET app for example) to compare hashes created outside of SQL Server.

Another option: Lean on SQL Server's CLR integration and perform the hashing in a .NET assembly.

edited Oct 14 '11 at 16:30

answered Oct 14 '11 at 15:59

Paul Sasik

79,492
20
149
189

1

thanks Paul, it's a really good solution for internal use. but in this case it has to be totally compliant with sha1 – SDReyes Oct 14 '11 at 16:28
1

@SDReyes: Another idea added to my answer: Lean on SQL Server's CLR integration and perform the hashing in a .NET assembly. – Paul Sasik Oct 14 '11 at 16:30
the CLR integration link in this answer is broken. – David Oct 13 '20 at 19:29

score 2 · Answer 3 · answered Oct 14 '11 at 16:02

2

Like Paul's idea, one idea that comes to mind for chunking would be to store the hashed string in an XML column, with each chunk as a separate XML element.

answered Oct 14 '11 at 16:02

Shan Plourde

8,528
2
29
42

+1 Good idea, though for data integrity's sake I would suggest a single hash value be stored (a hash of hashes.) With an XML structure you would need an extra attribute that maps a hash value to the index of the Nth chunk of data that was hashed. – Paul Sasik Oct 14 '11 at 16:08

Hashing more than 8000 bytes in SQL Server

3 Answers3

Linked

Related