Sql query to remove duplicate entries after sync from data source

Question

I am currently creating a service that fetches data from a public data source every 24 hours. The data is essentially structured as thus:

a	b	c	ImportDate
1	2	3	12.06.22
2	3	3	12.06.22
1	2	3	11.06.22

Where I want to have only unique values (ignoring the importdate), i.e something like this.

a	b	c	ImportDate
1	2	3	12.06.22
2	3	3	12.06.22

Where we remove the old duplicate value.

What would be the best way to approach this to ensure no data is actually lost, only the duplicate values.

Thanks in advance!

"ignoring the importdate", looks like you want most recent, not ignore it. There are many, many examples online (here and on other sites). What have you tried? — HoneyBadger, Jun 12 '22 at 21:08
Does this answer your question? [How to delete duplicate rows in SQL Server?](https://stackoverflow.com/questions/18390574/how-to-delete-duplicate-rows-in-sql-server) — Stu, Jun 12 '22 at 23:49

score 2 · Answer 1 · answered Jun 12 '22 at 21:32

You can use the row_number() window function, the function will create partitions in the values that you don't want to repeat, after all you can use the where clause to filter only the first ocurrencies.

select a, b, c, importDate
from (
  select a, b, c, importDate,
  row_number() over(partition by a,b,c order by a desc) rn
  from example
  ) a
where rn =1;

here is the example: https://www.db-fiddle.com/f/3iryppZrysgCPkRVjpCKyM/0

Ale Sosa · Answer 2 · 2022-06-12T21:30:00.037

0

After fetching all the data, in a second step I would do something like select a, b, c, max(ImportDate) as lastDate from source group by a, b, c, that should keep all the values with the last imported date.

edited Jun 12 '22 at 21:30

answered Jun 12 '22 at 21:05

Ale Sosa

93
1
8

score 0 · Answer 3 · answered Jun 12 '22 at 21:08

I think it's easiest to first import the new data in a separate table with the same Structure, let's say Import, and then merge it into YourTable.

The merge statement lets you query a piece of data, and match it against an existing table. In a single statement you can choose to update (or skip) existing rows, and insert new rows.

merge into YourTable t
using
  (select * from Import) i
on (i.a = t.a and i.b = t.b and i.c = t.c) -- Or just the columns you want to match
when matched then
  update set t.ImportDate = i.ImportDate -- add any other columns you want to update
when not matched then
  insert (a, b, c, ImportDate)
  values (i.a, i.b, i.c, i.ImportDate);

score 0 · Answer 4 · answered Jun 13 '22 at 03:20

You can use EXCEPT set operator to find out the difference.

declare @tgt table(a int, b int, c int);
;with src as
(
SELECT distinct * from
(values(1,  2,  3),(2,  3,  3),(5,6,7))as t(a,b,c)
), dst as
(
SELECT * from
(values(1,  2,  3),(2,  4,  5))as t(a,b,c)
)
select * from src
except
select * from dst

Sql query to remove duplicate entries after sync from data source

4 Answers4