0

I have a table with around 100 columns and need to only add a new row if a combination of columns is unique. The problem is that there are around 50 columns that I need to check so a simple where clause could be very slow on a large amount of data.

Would it be advised to create a unique hash of all the values of those fields in C# beforehand and storing that in a new column in the table and compare based on that?

Nick Reeve
  • 1,658
  • 2
  • 22
  • 32
  • Equality checks are faster than hashing, especially if the columns are indexed. – Panagiotis Kanavos Apr 11 '17 at 12:12
  • I agree. Why not one big `WHERE col1 = :val1 AND col2 = :val2` etc? – Mike Chamberlain Apr 11 '17 at 12:13
  • So build a where clause of 50 columns is better? – Nick Reeve Apr 11 '17 at 12:13
  • 1
    @NickReeve *why* do you have to do that in the first place? How did you end up with no way to identify duplicates other than checking every single column? Are there no keys or identifiers in the data? – Panagiotis Kanavos Apr 11 '17 at 12:14
  • It might be better. At least you don't have to change your DB schema, keep the hash up to date, and calculate the hash over 50 values to make the comparison. But you may have other constraints. – Mike Chamberlain Apr 11 '17 at 12:14
  • @PanagiotisKanavos I need to create a history of all unique rows from files that are imported every week. Unfortunately if any one of the 50 columns have changed, they want a new row – Nick Reeve Apr 11 '17 at 12:17
  • SQL Server already provides change tracking since 2008. You can retrieve IDs of changed rows, the reason for the change (including deletions), even changed column values. SQL Server 2016 added temporal tables that dow what you describe automatically – Panagiotis Kanavos Apr 11 '17 at 12:46
  • Change tracking is available in all editions, even Express. You can `SELECT * from CHANGETABLE(CHANGES myTable,@lastCheckedVersion)` to retrieve all changes since the last version you checked. No need to compare columns or face ambiguities about changes or deletions – Panagiotis Kanavos Apr 11 '17 at 12:48

2 Answers2

0

You Could use this link (hash a SQL row?) to hash the row that you are going to insert and for rows that are already present in table and use this checksum to verify uniqueness

Community
  • 1
  • 1
0

This sounds to be a good place to implement merge (if you are updating matching records also).Following is the article from MSDN:

https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql
Biswabid
  • 1,378
  • 11
  • 26