1

I'm using SQL Server 2008.

I've got a column NVARCHAR(MAX) in a table which I want to make sure is unique. The table has 600,000 records and grows every day by 50,000 records.

Currently before adding an item to the table I check if it exists in the table and if not I insert it.

IF NOT EXISTS (SELECT * FROM Softs Where Title = 'example example example.')
BEGIN
INSERT INTO Softs (....)
VALUES (...)
END

I don't have a index on the Title column

Recently, I started getting timeouts when inserting items to the table.

What would be the correct way to maintain the uniques?

If it would really help I can change the NVARCHAR(MAX) to NVARCHAR(450)

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
RuSh
  • 1,643
  • 7
  • 25
  • 39
  • How many duplicates do you expect? None? AKA should be all new rows? – gbn Sep 12 '10 at 18:00
  • well its really hard to tell but maybe around 5-10% of insert tries would be canceled because of duplicate item.. – RuSh Sep 12 '10 at 18:02
  • RE: Your last sentence changing to `NVARCHAR(900)` wouldn't be enough. It would need to be `NVARCHAR(450)` to create a unique constraint on it. Why is it `NVARCHAR(max)` though if this is possible? – Martin Smith Sep 12 '10 at 18:02
  • @Martin Smith: Correct. I can live with nvarchar(450) or varchar(900). but that would cause me to lose some titles that are longer than 900 or lose other that are not english. – RuSh Sep 12 '10 at 18:06
  • Why is it hard to tell how many duplicate entries are being made/attempted? – JeffO Sep 12 '10 at 20:28

1 Answers1

6

It's madness not to have an index.

It would help but the index key length can only be 900 bytes.

However, it's likely you already have duplicates because the potential for a 2nd EXISTS to run after the 1st EXISTS but before the 1st INSERT.

The index creation will tell you, and subsequently protect against this.

However, you can get errors under heavy load.

My favoured approach for high inserts/low duplicates is the JFDI pattern. Highly concurrent

BEGIN TRY
   INSERT etc
END TRY
BEGIN CATCH
    IF ERROR_NUMBER() <> 2627
      RAISERROR etc
END CATCH
gbn
  • 422,506
  • 82
  • 585
  • 676
  • thx , what kind of errors will i get ? "However, you can get errors under heavy load." – RuSh Sep 12 '10 at 18:15
  • The same reason why you may have duplicates already: the only different being errors from a unique index – gbn Sep 12 '10 at 18:18
  • Do you mean that after creating an index when i will insert an duplicate item ill get an error because of duplicity? or you mean ill get other errors ? – RuSh Sep 12 '10 at 18:21
  • Cool , so i can do the above and use the index on the checksum column. – RuSh Sep 12 '10 at 18:25
  • 2
    @sharru - `checksum` can give collisions. This is less probable with `hashbytes` according to [this article](http://www.mssqltips.com/tip.asp?tip=1868) – Martin Smith Sep 12 '10 at 18:29