0

I have a table where I need to delete any rows that have the same url and parent_url. The table has the following columns:

id, time, url, parent_url, source

with duplicates as follows:

1,12-24-2014,asdf.com, http://repeat.com, email
1,12-24-2014,zxzy.com, http://secondrepeat.com, email
2,12-26-2014,asdf.com, http://repeat.com, email
3,12-26-2014,zxzy.com, http://secondrepeat.com, email
4,12-27-2014,asdf.com, http://repeat.com, email

These would need to be reduced to the following 2 rows (having deleted 3 rows):

1,12-24-2014,asdf.com, http://repeat.com, email
1,12-24-2014,zxzy.com, http://secondrepeat.com, email

I have beeing trying to use a for-loop and have been looking for queries that follow this format, and have not found anything that has shed much light on how this count be done.

For example, I tried doing

DELETE FROM scraper WHERE (url, parent_url) IN (SELECT(url, parent_url) FROM scraper GROUP BY url, parent_url having count(*) > 1);
maudulus
  • 10,627
  • 10
  • 78
  • 117
  • http://stackoverflow.com/questions/22181875/how-to-remove-duplicates-in-a-table –  Oct 13 '14 at 19:31
  • And of course this: http://stackoverflow.com/search?q=[postgresql]+delete+duplicates –  Oct 13 '14 at 19:31
  • These examples you've given are significantly less complicated, no? For instance, they give no information on deleting based on `multiple` matches. – maudulus Oct 13 '14 at 19:56

1 Answers1

3

You can use not exists, for instance:

delete from scraper
     where exists (select 1
                   from scraper s2
                   where s2.url = scraper.url and s2.prent_url = s.parent_url and
                         s2.time < s.time
                  );

Or, you can adapt the referenced solutions for something like this:

delete from scraper
     where id not in (select min(id)
                      from scraper s2
                      group by s2.url, s2.prent_url 
                     );

It is unclear from the question whether time or id provides a better choice for the min().

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786