How to properly find for duplicates and deleted them

Question

I am trying to run a query but at some point my logic is wrong and I am having problems finding out why its not deleting / selecting the right data.

The Question I need to ask is: How can I delete duplicates resource_type=2 FROM user_address and keep the original resource_type=2.

I've created the following queries. The problem when I test my final query and check some of the results some of them don't have a repeated resource_type=2, this shouldnt be there. I cant find the logic problem to my query.

Can anyone spot the problem?

This Query gives me 602 results When trying to find who has duplicates. Thus I've 602 user_address that are unique aparently

SELECT MIN(id)
FROM user_address 
WHERE resource_type=2  
GROUP BY member_num 
HAVING COUNT(resource_type) > 1

If i do the following query i get 7420 results

SELECT count(*)
FROM user_address 
WHERE resource_type=2

When I do the Select query to check what would be deleted I get 6,818 results

SELECT * FROM user_address 
    WHERE id NOT IN 
    (select * from 
        (SELECT MIN(id)
            FROM user_address 
            WHERE resource_type=2  
            GROUP BY member_num 
            HAVING COUNT(resource_type) > 1
        ) as t) AND resource_type <> 1 AND resource_type <> 3 AND resource_type <> 4

@Mihai I just want to check if there is more than one address with `resource_type=2` if there is then delete all the rest that have `resource_type=2` but are duplicated but keep the lowest ID (first id). also minding not to delete those that have id 1, 3 or 4 — Jonathan Thurft, Mar 12 '14 at 17:42
@WillP. the problem is that when i check the final select statement some of the results dont have a repeated `resource_type=2` thus shouldnt be there — Jonathan Thurft, Mar 12 '14 at 17:45

score 0 · Answer 1 · answered Mar 12 '14 at 17:48

These all make sense.

You have 602 values that are duplicates according to your definition (resource_type = 2 on more than one row).

You have 7,420 rows that have resource_type = 2.

You have 6,818 rows that meet whatever conditions are in your final query. It may be a coincidence that 602 + 6,818 = 7,420.

I think the query you want is more like:

delete ua from user_addresses ua join
               (SELECT MIN(id) as minid, member_num
                FROM user_address 
                WHERE resource_type=2  
                GROUP BY member_num 
                HAVING COUNT(resource_type) > 1
               ) tokeep
               on ua.member_num = tokeep.member_num and
                  ua.id > tokeep.minid;

I don't know what the additional conditions are for in the final select query.

score 0 · Answer 2 · edited May 23 '17 at 12:30

0

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep.

this answer looks useful :

https://stackoverflow.com/a/18949/367006

edited May 23 '17 at 12:30

Community

1
1

answered Mar 12 '14 at 18:03

torun

465
1
5
11

How to properly find for duplicates and deleted them

2 Answers2