0

So i have a table like this, but there are around 3500 records that is updated weekly:

OID    UNIQUEID       RTYPE     TIME      OTHER
 1   ABC_20170215     SENT      12:30     item1
 2   ABC_20170215   RECEIVED    12:29     
 3   ABC_20170215     SENT      12:35     item4

So what I am trying to do is if UNIQUEID is a duplicate and RTYPE is different, I need to merge the rows of that pair. If there are 3 like above, I need to merge the pair that are closest in time, so the resulting table would look like below

OID    UNIQUEID       RTYPE         TIME            OTHER
 1   ABC_20170215   <whatever>   <whatever>         item1
 3   ABC_20170215   <whatever>   <whatever>         item4

The rtype and time field dont really matter too much because those fields can be dropped if needed or they can be calculated, doesn't matter. Hopefully someone has done this type of thing before and has some sample code on getting this accomplished.

this is not the same as posted before!!! I am not deleting duplicates, I am merging them based on pairs comparison and time difference closest to 0.

SO what I need is if

UNIQUEID = UNIQUEID <- duplicates/matches
and
RTYPE != RTYPE <- different

if there is more than one pair possibility, then subtract the time between the two, so for my example table

Pair1.Time - Pair1.Time = 1 (12.30-12.29)
Pair2.Time - Pair2.Time = 6 (12.29-12.35)

and whichever is closest to 0, merge that pair, but there could honestly be an infinite set of pair possibilities

KRL
  • 115
  • 1
  • 9
  • 1
    so where is your tried query ...? – Bhargav Chudasama Oct 13 '17 at 12:05
  • 1
    What if you have 2 or 4 records? What rules apply to these cases? – Giorgos Betsos Oct 13 '17 at 12:05
  • right now I dont have anything and I know that if there are more cases it can get complicated, i'm just trying to get something simple like the above scenario to work. – KRL Oct 13 '17 at 12:14
  • You wrote in question that rtype and time are not important. Does that mean that you want only one uniqueid for each item in OTHER field? – Harshil Doshi Oct 13 '17 at 12:36
  • Already answered multiple times https://stackoverflow.com/questions/12817919/delete-duplicate-records-keeping-original – RegBes Oct 13 '17 at 12:47
  • Possible duplicate of [Delete duplicate records keeping original](https://stackoverflow.com/questions/12817919/delete-duplicate-records-keeping-original) – RegBes Oct 13 '17 at 12:48
  • No, it doesn't really matter about the other fields, the UNIQUEID is a concat of several fields anyways that can be parsed out later. the main thing is UNIQUEID is the same, Rtype is different, then if there is more than one pair possibility, the pair with the closest time is merged. Ideally, during the merge, any empty values would be populated with the merged data if avail – KRL Oct 13 '17 at 12:53

1 Answers1

2

I solved the solution as I understand. Can you try this ? I am not assertive it's best solution. If someone answered with better i also would like see the perspective

;WITH cte AS
(
    select t.OID, tm.UNIQUEID, tm.RTYPE, 
    ROW_NUMBER() OVER(PARTITION BY tm.UNIQUEID, tm.RTYPE ORDER BY t.OID ) AS rno
    from (
        select UNIQUEID, RTYPE
        from MergeTable
        group by UNIQUEID, RTYPE
    ) tm
    inner join mergeTable t on t.UNIQUEID = tm.UNIQUEID and t.RTYPE = tm.RTYPE
)

select * from cte where rno = 1

In my sample data:

OID UNIQUEID        RTYPE       TIME    OTHER
1   ABC_20170215    SENT        12:30   item1
2   ABC_20170215    RECEIVED    12:29   NULL
3   ABC_20170215    SENT        12:31   item4
4   ABC_20170216    SENT        12:32   item3

after the query run:

OID UNIQUEID        RTYPE       TIME    OTHER
1   ABC_20170215    SENT        12:30   item1
2   ABC_20170215    RECEIVED    12:29   NULL
4   ABC_20170216    SENT        12:32   item3

Hope it's help to you.

arslanaybars
  • 1,813
  • 2
  • 24
  • 29
  • @KRL how is the answer ? could you try ? – arslanaybars Oct 13 '17 at 13:52
  • but you have type=type and it's not. it should be type <> type, then if there is more than one match possible, subtract the time. i made an edit to my question because it may not seem clear – KRL Oct 13 '17 at 13:57
  • @KRL i handle that in cte table I found all row which we should get. when you try the script is it work in your data ? – arslanaybars Oct 13 '17 at 13:58
  • hmm, actually wouldn't 1 and 2 be merged since there is a 1 minute difference, rather than 2 and 3 since there is a 2 minute difference? – KRL Oct 13 '17 at 15:13