I know that this topic came up many times before here but none of the suggested solutions worked for my dataset because my laptop stopped calculating due to memory issues or full storage.
My table looks like the following and has 108
Mio rows:
Col1 |Col2 | Col3 |Col4 |SICComb | NameComb
Case New |3523 | Alexander |6799 |67993523| AlexanderCase New
Case New |3523 | Undisclosed |6799 |67993523| Case NewUndisclosed
Undisclosed|6799 | Case New |3523 |67993523| Case NewUndisclosed
Case New |3523 | Undisclosed |6799 |67993523| Case NewUndisclosed
SmartCard |3674 | NEC |7373 |73733674| NECSmartCard
SmartCard |3674 | Virtual NetComm|7373 |73733674| SmartCardVirtual NetComm
SmartCard |3674 | NEC |7373 |73733674| NECSmartCard
The unique columns are SICComb
and NameComb
. I tried to add a primary key with:
ALTER TABLE dbo.test ADD ID INT IDENTITY(1,1)
but the integers are filling up more than 30
GB of my storage just in a new minutes.
Which would be the fastest and most efficient method to delete the duplicates from the table?