I have a table where I need to delete any rows that have the same url and parent_url. The table has the following columns:
id, time, url, parent_url, source
with duplicates as follows:
1,12-24-2014,asdf.com, http://repeat.com, email
1,12-24-2014,zxzy.com, http://secondrepeat.com, email
2,12-26-2014,asdf.com, http://repeat.com, email
3,12-26-2014,zxzy.com, http://secondrepeat.com, email
4,12-27-2014,asdf.com, http://repeat.com, email
These would need to be reduced to the following 2 rows (having deleted 3 rows):
1,12-24-2014,asdf.com, http://repeat.com, email
1,12-24-2014,zxzy.com, http://secondrepeat.com, email
I have beeing trying to use a for-loop and have been looking for queries that follow this format, and have not found anything that has shed much light on how this count be done.
For example, I tried doing
DELETE FROM scraper WHERE (url, parent_url) IN (SELECT(url, parent_url) FROM scraper GROUP BY url, parent_url having count(*) > 1);