0

I have about 75,000 entries/rows, and after using

SELECT WWW, COUNT(*) c FROM pmd_listings GROUP BY WWW HAVING c > 1;

I have discovered I have about 9000 duplicates (based on just the one column, probably a little less in actuality). I can only use group by with aggregate functions (not entirely sure what that means) but I want to remove duplicates if the rows match in these columns (to be sure they're duplicates)

"friendly_url" "www" and "listing_zip" Inside the table "pmd_listings".

I tried running this simulation:

https://pastebin.com/QWCb5jJ4

But I get the following error:

 #1054 - Unknown column 'c1.id' in 'where clause'

How can I acomplish this? Help is greatly appreciated!

Thank you!

Natsu
  • 111
  • 1
  • 2
  • 11
  • see https://stackoverflow.com/questions/11724055/delete-with-table-alias-in-mysql – Adrien Mar 01 '19 at 23:01
  • Or see https://meta.stackoverflow.com/questions/333952/why-should-i-provide-an-mcve-for-what-seems-to-me-to-be-a-very-simple-sql-query – Strawberry Mar 01 '19 at 23:06
  • That works @Adrien, however It's only checking against one column, would adding a second column after the first in the select field work? – Natsu Mar 01 '19 at 23:20
  • Yes of course, your error was the way you declared your alias, replacing `DELETE FROM pmd_listings c1` with `DELETE c1 FROM pmd_listings c1` should do the trick – Adrien Mar 01 '19 at 23:32
  • Check with this link https://stackoverflow.com/questions/9880068/remove-duplicate-entries-in-a-mysql-table – Ravikiran Reddy Kotapati Mar 02 '19 at 00:20

2 Answers2

1

The alias c1 needs to be referenced between DELETE and FROM for MySql.
Try this way:

DELETE FROM pmd_listings
WHERE id NOT IN (
  SELECT t.id FROM ( 
    SELECT MIN(id) id
    FROM pmd_listings
    GROUP BY friendly_url, www, listing_zip
  ) t
)
forpas
  • 160,666
  • 10
  • 38
  • 76
  • I'm now being told **#1093 - You can't specify target table 'c1' for update in FROM clause**. But I understand, the second part was kind of redundant. – Natsu Mar 01 '19 at 23:11
  • This seems really promising, when running simulation it's giving me Matched rows: 3265. There should be about 9000 when just going off the www so those other entries must be different in some way. This will work, thank you! – Natsu Mar 01 '19 at 23:49
0

You can use a left join with delete:

delete pl
    from pmd_listings pl left join
         (select pl2.friendly_url, pl2.www, pl2.listing_zip, min(id) as min_id
          from pmd_listings pl2
          group by pl2.friendly_url, pl2.www, pl2.listing_zip
         ) pl2
         on pl2.id = pl.id
    where pl2.id is null;

The aggregation subquery finds all minimum ids for the combinations of your three columns. The left join finds matches to those ids and the where means that all non-matching rows -- that is, all but the first row -- are deleted.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786