Removing duplicate rows (MySQL 5.7)

Question

I have about 75,000 entries/rows, and after using

SELECT WWW, COUNT(*) c FROM pmd_listings GROUP BY WWW HAVING c > 1;

I have discovered I have about 9000 duplicates (based on just the one column, probably a little less in actuality). I can only use group by with aggregate functions (not entirely sure what that means) but I want to remove duplicates if the rows match in these columns (to be sure they're duplicates)

"friendly_url" "www" and "listing_zip" Inside the table "pmd_listings".

I tried running this simulation:

https://pastebin.com/QWCb5jJ4

But I get the following error:

 #1054 - Unknown column 'c1.id' in 'where clause'

How can I acomplish this? Help is greatly appreciated!

Thank you!

see https://stackoverflow.com/questions/11724055/delete-with-table-alias-in-mysql — Adrien, Mar 01 '19 at 23:01
Or see https://meta.stackoverflow.com/questions/333952/why-should-i-provide-an-mcve-for-what-seems-to-me-to-be-a-very-simple-sql-query — Strawberry, Mar 01 '19 at 23:06
That works @Adrien, however It's only checking against one column, would adding a second column after the first in the select field work? — Natsu, Mar 01 '19 at 23:20
Yes of course, your error was the way you declared your alias, replacing `DELETE FROM pmd_listings c1` with `DELETE c1 FROM pmd_listings c1` should do the trick — Adrien, Mar 01 '19 at 23:32
Check with this link https://stackoverflow.com/questions/9880068/remove-duplicate-entries-in-a-mysql-table — Ravikiran Reddy Kotapati, Mar 02 '19 at 00:20

forpas · Accepted Answer · 2019-03-01T23:38:27.817

1

The alias c1 needs to be referenced between DELETE and FROM for MySql.
Try this way:

DELETE FROM pmd_listings
WHERE id NOT IN (
  SELECT t.id FROM ( 
    SELECT MIN(id) id
    FROM pmd_listings
    GROUP BY friendly_url, www, listing_zip
  ) t
)

edited Mar 01 '19 at 23:38

answered Mar 01 '19 at 23:00

forpas

160,666
10
38
76

I'm now being told **#1093 - You can't specify target table 'c1' for update in FROM clause**. But I understand, the second part was kind of redundant. – Natsu Mar 01 '19 at 23:11
This seems really promising, when running simulation it's giving me Matched rows: 3265. There should be about 9000 when just going off the www so those other entries must be different in some way. This will work, thank you! – Natsu Mar 01 '19 at 23:49

score 0 · Answer 2 · answered Mar 01 '19 at 23:45

You can use a left join with delete:

delete pl
    from pmd_listings pl left join
         (select pl2.friendly_url, pl2.www, pl2.listing_zip, min(id) as min_id
          from pmd_listings pl2
          group by pl2.friendly_url, pl2.www, pl2.listing_zip
         ) pl2
         on pl2.id = pl.id
    where pl2.id is null;

The aggregation subquery finds all minimum ids for the combinations of your three columns. The left join finds matches to those ids and the where means that all non-matching rows -- that is, all but the first row -- are deleted.

Removing duplicate rows (MySQL 5.7)

2 Answers2