1

I'm using the following query to display duplicate entries in my database (looks like I've accidentally run the script twice, though it may have been more as it may have been a malicious user).

I'm using the following query, though it doesn't actually satisfy my requirements:

SELECT meta_value, COUNT(meta_value) AS cnt
FROM wp_postmeta
GROUP BY meta_value
HAVING cnt > 1
ORDER BY cnt;

I'd also like the query to check if the duplicate entries have the same post_id and meta_key.

So, for example:

meta_id     post_id     meta_key    meta_value
1           10          size        large
2           10          colour      blue
3           10          size        large
4           11          size        large

meta_id 1 and 3 are duplicate entries.

I would like to delete ALL BUT ONE of such entries.

Is there a way of doing this with a single query? If I could view the rows first to ensure that the query does as expected, that would be great.

Thanks in advance,

turbonerd
  • 1,234
  • 4
  • 27
  • 63
  • possible duplicate of [Deleting duplicate rows from a table](http://stackoverflow.com/questions/1043488/deleting-duplicate-rows-from-a-table) – Marc B Mar 14 '12 at 17:56
  • 1
    first figure out why you have gotten duplicates! – Har Mar 14 '12 at 17:56

4 Answers4

4

Try this -

DELETE t2
FROM wp_postmeta t1
INNER JOIN wp_postmeta t2
    ON t1.post_id = t2.post_id
    AND t1.meta_key = t2.meta_key
    AND t1.meta_value = t2.meta_value
    AND t1.meta_id < t2.meta_id

Take note of @RolandBouman's warning about references to the rows you are deleting.

Roland Bouman
  • 31,125
  • 6
  • 66
  • 67
user1191247
  • 10,808
  • 2
  • 22
  • 32
2

"I'd also like the query to check if the duplicate entries have the same post_id and meta_key."

then, use those in the group by as well.

SELECT meta_value, COUNT(meta_value) AS cnt
FROM wp_postmeta
GROUP BY post_id, meta_key, meta_value
HAVING cnt > 1
ORDER BY cnt;

"I would like to delete ALL BUT ONE of such entries."

This is unfortunately not so easy in MySQL. (see http://dev.mysql.com/doc/refman/5.5/en/delete.html)

There is something called multi-table DELETE syntax but its useless if you need to join to the same table as the one where you want to delete from. Using a subquery also won't fly, because you can't select from the same table as the one you're deleting from.

Unfortunately the easiest way to do this is to create a temporary table based on the group by query and use that to join to:

CREATE TABLE wp_postmeta_delete
AS
SELECT MIN(meta_id) meta_id
,      post_id 
,      meta_key
,      meta_value
FROM wp_postmeta
GROUP BY post_id, meta_key, meta_value
HAVING count(*) > 1;

DELETE     wp_postmeta.*
FROM       wp_postmeta
INNER JOIN wp_postmeta_delete t2
ON         wp_postmeta.meta_id   != t2.meta_id
AND        wp_postmeta.post_id    = t2.post_id
AND        wp_postmeta.meta_key   = t2.meta_key
AND        wp_postmeta.meta_value = t2.meta_value;

After deleting the rows you can discard the temporary table:

DROP TABLE wp_postmeta_delete;

Note that in many cases just deleting duplicates may not be good enough; If other tables point to duplicate rows, then you should migrate those references to point at the corresponding unique row that you're keeping around.

Roland Bouman
  • 31,125
  • 6
  • 66
  • 67
1

EDIT: nnichols answer is way better.


try something like this - there may be a more efficient way, but off the top of my head it seems to work.

delete from wp_postmeta
where meta_id in 
(select meta_id
from 
     (select meta_key, meta_value, post_id
      from wp_postmeta
      group by meta_key, meta_value, post_id
      having count(*) > 1) problemGroups
inner join wp_postmeta a
      on a.meta_key = problemGroups.meta_key
      and a.meta_value = problemGroups.meta_value
      and a.post_id = problemGroups.post_id) allIDs
and meta_id not in
(select min(meta_id)
from 
     (select meta_key, meta_value, post_id
      from wp_postmeta
      group by meta_key, meta_value, post_id
      having count(*) > 1) problemGroups
inner join wp_postmeta a
      on a.meta_key = problemGroups.meta_key
      and a.meta_value = problemGroups.meta_value
      and a.post_id = problemGroups.post_id
group by problemGroups.meta_key, meta_value, port_id) minIDS
user158017
  • 2,891
  • 30
  • 37
0

I haven't tested, but something like this should work (this assumes your select statement grabs what you want it to)

DELETE FROM wp_postmeta
WHERE meta_id IN (
                SELECT meta_id
                FROM wp_postmeta
                GROUP BY meta_value
                HAVING COUNT(meta_value) > 1
                ORDER BY cnt
                );
Luke Shaheen
  • 4,262
  • 12
  • 52
  • 82