Removing duplicates from SQL Server table

Question

I have a situation in my SQL Server table where I have numerous duplicates of records like following:

 SID    Username                                InQueue
------------------------------------------------------------------    
 162    peeeer2  492    2017-01-18 12:20:21.820   0    354
2791    peeeer2  460    2017-01-11 00:00:00.000   1    NULL

The unwanted record here is the peeeer2 user for whom which I have set InQueue = true. I need to remove all of those duplicates where the InQueue column is set to 1 and the another criteria is that username is actually a duplicate...

The table name is SearchedUsernames:

delete from SearchedUsernames
where Username ??

Can someone help me out with this ?

Edit:

@TimSchmelter ty so much, this works like a charm. I get an error still however. I need to first drop the neighbouring FK's of this table. For example when I have a corresponding FK record in the neighbouring table called UserTransactions like following:

 ID      SID                              
----------------  
 162    162
2791    2791

I need to first drop all the records in this neighbouring table and then delete the duplicates using your query that you wrote. However, this time I'd like to drop ONLY those which HAVE duplicate records and have set InQueue = 0;

So the scenario would look like this:

Drop the FK records SID from both duplicates in neighbouring table UserTransactions
Then execute the query that DTV & Tim wrote with a minor change to drop only those records which are duplicate and have set InQueue = 0;

Google much? Possible duplicate of [How can I remove duplicate rows?](http://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows) — HoneyBadger, Jan 18 '17 at 12:35
@HoneyBadger yes I understand your statement. However since I have about 400k records in the table, I couldn't risk to write the query on my own, if you understand my concern... — User987, Jan 18 '17 at 12:37
Not really, you can always make a (smaller) backup of the table to test on. — HoneyBadger, Jan 18 '17 at 12:40
there is always a record with `InQueue=0`? what is the name of 1st column? is it an ID? — MtwStark, Jan 18 '17 at 12:56

DVT · Accepted Answer · 2017-01-18T12:53:56.093

5

WITH cte AS (
    SELECT Username, inqueue,
        ROW_NUMBER() OVER (PARTITION BY Username ORDER BY InQueue ASC) AS RN
    FROM searchedUsernames   
)
DELETE FROM cte
WHERE RN > 1;

If you afraid of more than one Inqueue=0, then use RANK

WITH cte AS (
    SELECT Username, inqueue,
        RANK() OVER (PARTITION BY Username ORDER BY InQueue ASC) AS RN
    FROM searchedUsernames   
)
DELETE FROM cte
WHERE RN > 1;

edited Jan 18 '17 at 12:53

answered Jan 18 '17 at 12:34

DVT

3,014
1
13
19

1

@TimSchmelter Thanks. It is fixed. – DVT Jan 18 '17 at 12:36
2

I see one issue: if there are two `InQueue=0` and one `InQueue=1` with the same Username. If i understand OP correctly only the `InQueue=1` should be deleted. You would also delete the second `InQueue=0`. Maybe that's not possible then there is no problem – Tim Schmelter Jan 18 '17 at 12:37
@TimSchmelter yes that's correct, should the query be modified? – User987 Jan 18 '17 at 12:39
From the OP, he set the InQueue=1 for those records. I think he just try to make sure that only those with Inqueue =1 which already has an existing record got deleted. If he wants more, as far as I can tell, he did not provide enough info. – DVT Jan 18 '17 at 12:40
@DVT you're correct that's what I'm trying to achieve. I'll execute the 2nd query that you provided. Wish me luck lol :D – User987 Jan 18 '17 at 12:42
Hey @DVT I'm getting error for 2nd query saying Level 15, State 1, Line 13 Incorrect syntax near '('. – User987 Jan 18 '17 at 12:49
1

Try again. I missed a comma. – DVT Jan 18 '17 at 12:54
The `RANK` (or `DENSE_RANK`) approach is fine and avoids my mentioned issue :) – Tim Schmelter Jan 18 '17 at 12:54
@TimSchmelter, DTV ty so much for the help. Can you guys look into my edited question ? – User987 Jan 18 '17 at 13:25
@User987: set delete cascade on this foreign key constraint. Then you'll automatically delete these records. – Tim Schmelter Jan 18 '17 at 13:29
@TimSchmelter can I add the on cascade delete without having to drop the FK itself ? – User987 Jan 18 '17 at 13:31
@TimSchmelter Just edited, still doesn't allows to drop the keys in foreign table unfortunately – User987 Jan 18 '17 at 13:46
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/133455/discussion-between-tim-schmelter-and-user987). – Tim Schmelter Jan 18 '17 at 13:48

score 3 · Answer 2 · answered Jan 18 '17 at 12:57

3

Probably the most intuitive solution would be:

delete s from SearchedUsernames s
where InQueue = 1 and exists(select * from SearchedUsernames
                             where InQueue = 0 and Username = s.Username)

answered Jan 18 '17 at 12:57

Giorgi Nakeuri

35,155
8
47
75

score 1 · Answer 3 · answered Jan 18 '17 at 12:37

1

You could use a self join like this:

 DELETE t0
 FROM SearchedUsernames t0
 INNER JOIN SearchedUsernames t1 ON(t0.Username = t1.Username AND t0.IsQueue <> T1.IsQueue)
 WHERE AND t0.IsQueue = 1

answered Jan 18 '17 at 12:37

Zohar Peled

79,642
10
69
121

Removing duplicates from SQL Server table

3 Answers3