0

I am trying to run a query but at some point my logic is wrong and I am having problems finding out why its not deleting / selecting the right data.

The Question I need to ask is: How can I delete duplicates resource_type=2 FROM user_address and keep the original resource_type=2.

I've created the following queries. The problem when I test my final query and check some of the results some of them don't have a repeated resource_type=2, this shouldnt be there. I cant find the logic problem to my query.

Can anyone spot the problem?

This Query gives me 602 results When trying to find who has duplicates. Thus I've 602 user_address that are unique aparently

SELECT MIN(id)
FROM user_address 
WHERE resource_type=2  
GROUP BY member_num 
HAVING COUNT(resource_type) > 1

If i do the following query i get 7420 results

SELECT count(*)
FROM user_address 
WHERE resource_type=2 

When I do the Select query to check what would be deleted I get 6,818 results

SELECT * FROM user_address 
    WHERE id NOT IN 
    (select * from 
        (SELECT MIN(id)
            FROM user_address 
            WHERE resource_type=2  
            GROUP BY member_num 
            HAVING COUNT(resource_type) > 1
        ) as t) AND resource_type <> 1 AND resource_type <> 3 AND resource_type <> 4 
Jonathan Thurft
  • 4,087
  • 7
  • 47
  • 78
  • What is the criteria for original duplicate? – Mihai Mar 12 '14 at 17:41
  • @Mihai I just want to check if there is more than one address with `resource_type=2` if there is then delete all the rest that have `resource_type=2` but are duplicated but keep the lowest ID (first id). also minding not to delete those that have id 1, 3 or 4 – Jonathan Thurft Mar 12 '14 at 17:42
  • 7420 - 602 = 6818, where is the problem? – Will P. Mar 12 '14 at 17:43
  • @WillP. the problem is that when i check the final select statement some of the results dont have a repeated `resource_type=2` thus shouldnt be there – Jonathan Thurft Mar 12 '14 at 17:45

2 Answers2

0

These all make sense.

You have 602 values that are duplicates according to your definition (resource_type = 2 on more than one row).

You have 7,420 rows that have resource_type = 2.

You have 6,818 rows that meet whatever conditions are in your final query. It may be a coincidence that 602 + 6,818 = 7,420.

I think the query you want is more like:

delete ua from user_addresses ua join
               (SELECT MIN(id) as minid, member_num
                FROM user_address 
                WHERE resource_type=2  
                GROUP BY member_num 
                HAVING COUNT(resource_type) > 1
               ) tokeep
               on ua.member_num = tokeep.member_num and
                  ua.id > tokeep.minid;

I don't know what the additional conditions are for in the final select query.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep.

this answer looks useful :

https://stackoverflow.com/a/18949/367006

Community
  • 1
  • 1
torun
  • 465
  • 1
  • 5
  • 11