84

We have a table of photos with the following columns:

id, merchant_id, url 

this table contains duplicate values for the combination merchant_id, url. so it's possible that one row appears more several times.

234 some_merchant  http://www.some-image-url.com/abscde1213
235 some_merchant  http://www.some-image-url.com/abscde1213
236 some_merchant  http://www.some-image-url.com/abscde1213

What is the best way to delete those duplications? (I use PostgreSQL 9.2 and Rails 3.)

schlubbi
  • 1,623
  • 2
  • 13
  • 17
  • 2
    Is your ID column unique? I see 234 3 times but you say your merchant_id and url are the duplicate values. – sgeddes Jan 23 '13 at 02:27
  • 1
    Possible duplicate of http://stackoverflow.com/questions/1746213/how-to-delete-duplicate-entries-in-postgresql –  Jan 23 '13 at 02:51
  • 1
    sorry for the confusion. the id in the example above should be unique. thanks for the correct edit. the solution here stackoverflow.com/questions/1746213/… doesn't work for my case. – schlubbi Jan 23 '13 at 08:26

3 Answers3

147

Here is my take on it.

select * from (
  SELECT id,
  ROW_NUMBER() OVER(PARTITION BY merchant_Id, url ORDER BY id asc) AS Row
  FROM Photos
) dups
where 
dups.Row > 1

Feel free to play with the order by to tailor the records you want to delete to your specification.

SQL Fiddle => http://sqlfiddle.com/#!15/d6941/1/0


SQL Fiddle for Postgres 9.2 is no longer supported; updating SQL Fiddle to postgres 9.3

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
MatthewJ
  • 3,127
  • 2
  • 27
  • 34
10

The second part of sgeddes's answer doesn't work on Postgres (the fiddle uses MySQL). Here is an updated version of his answer using Postgres: http://sqlfiddle.com/#!12/6b1a7/1

DELETE FROM Photos AS P1  
USING Photos AS P2
WHERE P1.id > P2.id
   AND P1.merchant_id = P2.merchant_id  
   AND P1.url = P2.url;  
11101101b
  • 7,679
  • 2
  • 42
  • 52
6

I see a couple of options for you.

For a quick way of doing it, use something like this (it assumes your ID column is not unique as you mention 234 multiple times above):

CREATE TABLE tmpPhotos AS SELECT DISTINCT * FROM Photos;
DROP TABLE Photos;
ALTER TABLE tmpPhotos RENAME TO Photos;

Here is the SQL Fiddle.

You will need to add your constraints back to the table if you have any.

If your ID column is unique, you could do something like to keep your lowest id:

DELETE FROM P1  
USING Photos P1, Photos P2
WHERE P1.id > P2.id
   AND P1.merchant_id = P2.merchant_id  
   AND P1.url = P2.url;  

And the Fiddle.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
sgeddes
  • 62,311
  • 6
  • 61
  • 83
  • 2
    the id is unique in my case. I just did it wrong in my example code. but I get an error if I try to use your second solution. `ERROR: relation "p1" does not exist` – schlubbi Jan 23 '13 at 08:05
  • @StefanSchmidt I fixed it to run on Postgres instead of MySQL: http://sqlfiddle.com/#!12/6b1a7/1 – 11101101b Mar 10 '15 at 21:12