0

How to delete duplicate records from oldest the newest without a timestamp and GUID as Primary Key?

  • eg, if I have 4 records, delete the oldest 3 based of the GUID

This is as far as I have got,

WITH cte 
     AS(SELECT ID, ROW_NUMBER() OVER(PARTITION BY CodeOne, CodeTwo
     ORDER BY(SELECT 0)) RN
     FROM [InvoiceDatabase].[dbo].[LookUpCode])
     DELETE FROM cte
     WHERE  RN > 1;

But this is not doing what I need, although is deleting duplicates correctly. Is there a way doing this or is it impossible to use a GUID to tell which record is older?

KyloRen
  • 2,691
  • 5
  • 29
  • 59
  • 1
    Add a timestamp column, or an increment ID column, or something to reflect order.... otherwise it's going to be random. – S3S Nov 08 '18 at 14:59
  • Possible duplicate of [Finding duplicate values in a SQL table](https://stackoverflow.com/questions/2594829/finding-duplicate-values-in-a-sql-table) – Rahul Neekhra Nov 08 '18 at 15:00
  • @RahulNeekhra, apart from the duplicate part, this is a completely different question. – KyloRen Nov 08 '18 at 15:02
  • 1
    @scsimon, sigh, I was afraid of that. oh well, it was worth the try. thanks. – KyloRen Nov 08 '18 at 15:02
  • 1
    Most if not all modern GUIDs are v4 GUIDs, which use all random values for their bits. That includes `NEWID()`. There are GUID formats that do include a timestamp, but they're not in common use anymore. T-SQL's `NEWSEQUENTIALID()` is, as the name implies, sequential, but only until the machine is restarted. Bottom line, GUIDs aren't the best things to rely on for order. – Jeroen Mostert Nov 08 '18 at 15:05

1 Answers1

2

If you want to delete duplicate records but without a timestamp and GUID as Primary Key you can try to use %%physloc%%.

%%physloc%% is a Virtual Column. This column is undocumented. Hence you can use it at you own risk.

WITH cte 
     AS(SELECT ID, ROW_NUMBER() OVER(PARTITION BY CodeOne, CodeTwo
     ORDER BY  %%physloc%%) RN
 FROM [InvoiceDatabase].[dbo].[LookUpCode])
 DELETE FROM cte
WHERE  RN > 1;

sqlfiddle

Note

But I would suggest you create a timestamp for the order.

D-Shih
  • 44,943
  • 6
  • 31
  • 51
  • 1
    How accurate will this be? If there is a possibility that it will fail, I am probably better off with creating a timestamp column. – KyloRen Nov 08 '18 at 15:10
  • I am going to UV for the help and make accepted as it does what I asked in the question. Thanks a bunch for the help. – KyloRen Nov 08 '18 at 15:13
  • 2
    After reading about it, I wouldn't trust `%%physloc%%`. If your table has had any deletions, page splits, etc, the value of `%%physloc%%` may no longer correspond to the chronological order of the insertion of the rows. – Tab Alleman Nov 08 '18 at 15:14
  • 2
    Please don't do this, it's violently unreliable. The physical order of rows represents their insert order only if you basically do nothing with the table except inserting with fully filled pages. Any other modification and all bets are off. This can be especially pernicious if it's right for *most* rows, except a few. Good luck troubleshooting that one. – Jeroen Mostert Nov 08 '18 at 15:16
  • I like that solution. Had a read of this as well which will come in handy. https://sqlity.net/en/2451/physloc/ – JonTout Nov 08 '18 at 15:17
  • I am going to make a `timestamp` column. Could someone post in the comments how to amend my query to get only the latest record using `timnestamp` column. I don't want to ask another question and editing the orignal will screw the answer up. Thanks – KyloRen Nov 08 '18 at 15:18
  • Would it be something like this? `WITH cte AS(SELECT ID, ROW_NUMBER() OVER(PARTITION BY CodeOne, CodeTwo ORDER BY MIN(TimeStampCol)) RN FROM [InvoiceDatabase].[dbo].[LookUpCode]) DELETE FROM cte WHERE RN > 1;` – KyloRen Nov 08 '18 at 15:20
  • By the way, when we say `TimeStamp` we are referring to usage, not the SQL Server DataType. – Tab Alleman Nov 08 '18 at 15:21
  • @TabAlleman, do you mean `DateTime` data type column so you can append the time the record was inserted? – KyloRen Nov 08 '18 at 15:50
  • @KyloRen yes, that's exactly what I mean. – Tab Alleman Nov 08 '18 at 15:51
  • @TabAlleman, great, then we are on the same page. That is what I was intending to do. – KyloRen Nov 08 '18 at 15:51