-1

I am currently creating a service that fetches data from a public data source every 24 hours. The data is essentially structured as thus:

a b c ImportDate
1 2 3 12.06.22
2 3 3 12.06.22
1 2 3 11.06.22

Where I want to have only unique values (ignoring the importdate), i.e something like this.

a b c ImportDate
1 2 3 12.06.22
2 3 3 12.06.22

Where we remove the old duplicate value.

What would be the best way to approach this to ensure no data is actually lost, only the duplicate values.

Thanks in advance!

kiooikml
  • 307
  • 1
  • 7
  • "ignoring the importdate", looks like you want most recent, not ignore it. There are many, many examples online (here and on other sites). What have you tried? – HoneyBadger Jun 12 '22 at 21:08
  • Does this answer your question? [How to delete duplicate rows in SQL Server?](https://stackoverflow.com/questions/18390574/how-to-delete-duplicate-rows-in-sql-server) – Stu Jun 12 '22 at 23:49

4 Answers4

2

You can use the row_number() window function, the function will create partitions in the values that you don't want to repeat, after all you can use the where clause to filter only the first ocurrencies.

select a, b, c, importDate
from (
  select a, b, c, importDate,
  row_number() over(partition by a,b,c order by a desc) rn
  from example
  ) a
where rn =1;

here is the example: https://www.db-fiddle.com/f/3iryppZrysgCPkRVjpCKyM/0

0

After fetching all the data, in a second step I would do something like select a, b, c, max(ImportDate) as lastDate from source group by a, b, c, that should keep all the values with the last imported date.

Ale Sosa
  • 93
  • 1
  • 8
0

I think it's easiest to first import the new data in a separate table with the same Structure, let's say Import, and then merge it into YourTable.

The merge statement lets you query a piece of data, and match it against an existing table. In a single statement you can choose to update (or skip) existing rows, and insert new rows.

merge into YourTable t
using
  (select * from Import) i
on (i.a = t.a and i.b = t.b and i.c = t.c) -- Or just the columns you want to match
when matched then
  update set t.ImportDate = i.ImportDate -- add any other columns you want to update
when not matched then
  insert (a, b, c, ImportDate)
  values (i.a, i.b, i.c, i.ImportDate);
GolezTrol
  • 114,394
  • 18
  • 182
  • 210
0

You can use EXCEPT set operator to find out the difference.

declare @tgt table(a int, b int, c int);
;with src as
(
SELECT distinct * from
(values(1,  2,  3),(2,  3,  3),(5,6,7))as t(a,b,c)
), dst as
(
SELECT * from
(values(1,  2,  3),(2,  4,  5))as t(a,b,c)
)
select * from src
except
select * from dst
Venkataraman R
  • 12,181
  • 2
  • 31
  • 58