1

I would like to create a unique index, that checks if a combination of text in my table is already there or not. In PostgreSQL I did this with a simple CREATE INDEX:

CREATE UNIQUE INDEX table_unique
    ON cd.hdealerproductdata USING btree
    (md5((((svId::text || manufacturer::text) || manufacturerreference::text) || path::text) || treetype::text) COLLATE pg_catalog."default")
    TABLESPACE pg_default;

How can I do this in SQL Server (2016)? I tried to create a computed column (and add a unique index to that) like this (using SSMS, table designer->Properties->Computed Column Specification):

ISNULL(HashBytes('MD5',CONVERT(VARCHAR(512),CONCAT(Manufacturer,ManufacturerReference))), 'null')

But I got an error saying it cannot be validated.

Edit.: I could even use SHA-2 with HashBytes: https://learn.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql

Edit2.: The HashBytes returns varbinary types but I cannot specify a datatype for a computed column.

enter image description here

After:

'Document' table
- Unable to modify table.  
Implicit conversion from data type varchar to varbinary is not allowed. Use the CONVERT function to run this query.

Edit3.: I ended up creating a Scalar function for this, and calling it while inserting and creating the computed column (which I persisted and created a unique index on).

CREATE FUNCTION [dbo].[DocumentUniqueHash] 
(
    @DocumentTreeId bigint,
    @Manufacturer nvarchar(255),
    @ManufacturerReference nvarchar(255)
)
RETURNS varbinary(20)
WITH SCHEMABINDING
AS
BEGIN
    -- Declare the return variable here
    DECLARE @Result varbinary(20)

    SELECT @Result = (hashbytes('SHA1',(CONVERT([nvarchar](max),@DocumentTreeId)+@Manufacturer)+@ManufacturerReference))

    -- Return the result of the function
    RETURN @Result

END

And called the Compute column: ([dbo].[DocumentUniqueHash]([DocumentTreeId],[Manufacturer],[ManufacturerReference]))

Also, did the insert stored procedure like this:

CREATE PROCEDURE DocumentInsert
    @DocumentTreeId bigint,
    @Manufacturer nvarchar(255),
    @ManufacturerReference nvarchar(255),
    @NewId bigint OUTPUT
AS
BEGIN
    SELECT @NewId = Id FROM Document (NOLOCK) 
    WHERE UniqueHash = [dbo].[DocumentUniqueHash](@DocumentTreeId, @Manufacturer, @ManufacturerReference)

    IF @NewId IS NULL
    BEGIN
        SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
        BEGIN TRANSACTION
            SELECT @NewId = Id FROM Document (NOLOCK) 
            WHERE UniqueHash = [dbo].[DocumentUniqueHash](@DocumentTreeId, @Manufacturer, @ManufacturerReference)
            IF @NewId IS NULL
            BEGIN
                INSERT INTO Document (DocumentTreeId, Manufacturer, ManufacturerReference) 
                VALUES (@DocumentTreeId, @Manufacturer, @ManufacturerReference)
                SELECT @NewId = SCOPE_IDENTITY()
            END
        COMMIT TRANSACTION
    END
    SELECT @NewId
END
GO
szab.kel
  • 2,356
  • 5
  • 40
  • 74
  • It's unlikely to happen by accident, but since MD5 collisions *have* been demonstrated, I'd question whether it's a good candidate for evaluating uniqueness. Why have you ruled out applying a unique constraint against the columns themselves? – Damien_The_Unbeliever Sep 05 '17 at 08:10
  • @Damien_The_Unbeliever The columns are long varchars (`nvarchar(512)`). Also, I am not really afraid of md5 collision to happen by accident (if it is on purpose, it is another topic). I could use sha1 or better, that does not really matter. – szab.kel Sep 05 '17 at 08:12
  • It doesn't make sense to request unique hashes. Hashes are not intended to be completely unique; they are digests used for quick comparisons and bucketing. They are intentionally lossy reductions. If you want a unique index you should compose something else, though based on your use case I'd just give up the idea of a unique index. (that said I'll be interested to hear what other answers you get!) – charles-allen Sep 05 '17 at 08:24
  • @CodeConfident Sometimes you just need to tell if certain strings are unique in your database or not. Using a hash (even md5) works well I think. You have a really small probability to get a collision. https://stackoverflow.com/a/288519/840315 – szab.kel Sep 05 '17 at 08:28
  • Absolutely. Make the column. Query it. But that's not what a unique constraint. You're trying to tell the database to enforce an invariant of your data that's not true. – charles-allen Sep 05 '17 at 08:30
  • @CodeConfident How else would you satisfy a business logic that tell you to not allow duplicate combination if these texts then? – szab.kel Sep 05 '17 at 08:33
  • @appl3r - Sorry, you're right. My line of thought was mislead but the word unique rather than understanding your underlying need. I would do exactly the same as you provided that was the business definition of a duplicate. I'd probably avoid MD5 though if you're using it this way. I think I've also misinterpreted the error; I thought you meant SQL Server rejected your logic but I think it's just a syntax issue. Do you really need the ISNULL? Seems begging for false positives. Sorry just came out of InfoSec class so my mind's coming from the wrong perspective. – charles-allen Sep 05 '17 at 08:42
  • @CodeConfident ISNULL is there because I read somewhere that SQL Server tries to validate and not allow null values, but I just put there in my misery. – szab.kel Sep 05 '17 at 08:46

1 Answers1

1

It's your choice of the ISNULL replacement value that's tripping you up.

Run:

declare @t table (Manufacturer varchar(512), ManufacturerReference varchar(512))

select ISNULL(HashBytes('MD5',CONVERT(VARCHAR(512),
       CONCAT(Manufacturer,ManufacturerReference))), 'null')
from @t

And you get the error

Msg 257, Level 16, State 3, Line 4

Implicit conversion from data type varchar to varbinary is not allowed. Use the CONVERT function to run this query.

But run this query:

declare @t table (Manufacturer varchar(512), ManufacturerReference varchar(512))

select ISNULL(HashBytes('MD5',CONVERT(VARCHAR(512),
       CONCAT(Manufacturer,ManufacturerReference))), 0x) --not a string any more
from @t

And it runs without error

Community
  • 1
  • 1
Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448