-2

I ve got duplicated rows in a temp table mainly because there are some date values which are seconds/miliseconds different to each other.

For example:

2018-08-30 12:30:19.000

2018-08-30 12:30:20.000

This is what causes the duplication.

How can I keep only one of those values? Let s say the higher one?

Thank you.

Community
  • 1
  • 1
madlicksxxx
  • 57
  • 1
  • 8
  • 3
    well, we need to see the rest of the data... and know your threshold for duplication? Would 2 seconds be a duplicate? 30 seconds? – S3S Oct 19 '18 at 16:23
  • it s roughly 6 mil rows. 60 seconds should be enough – madlicksxxx Oct 19 '18 at 16:24
  • 1
    The important part is the key of the table. "The rest of the data" here mostly means the other columns, what is the primary key, and maybe a few rows of sample data that would include one of these duplicates. We certainly do not need to see 6 million records to understand the problem fully, but not seeing any records is a bit of a hindrance. – JNevill Oct 19 '18 at 16:25
  • what are the other columns... we need some sample data and expected results – S3S Oct 19 '18 at 16:25
  • `92860635 1.541666 27.4347685 2018-09-28 11:36:07.000 124 40 2018-09-01 01:10:03.000 2018-08-30 12:30:19.000 95 23` The only key is the first int value – madlicksxxx Oct 19 '18 at 16:30

2 Answers2

5

Well, one method is to use lead():

select t.*
from (select t.*, lead(ts) over (order by ts) as next_ts
      from t
     ) t
where next_ts is null or
      datediff(second, ts, next_ts) < 60;   -- or whatever threshold you want
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

You could assign a Row_Number to each value, as follows:

Select *
, Row_Number() over 
              (partition by ObjectID, cast(date as date)... ---whichever criteria you want to consider duplicates
               order by date desc) --assign the latest date to row 1, may want other order criteria if you might have ties on this field
as RN
from MyTable

Then retain only the rows where RN = 1 to remove duplicates. See this answer for examples of how to round your dates to the nearest hour, minute, etc. as needed; I used truncating to the day above as an example.

APH
  • 4,109
  • 1
  • 25
  • 36