7
id_specific_price    id_product  
-------------------------------
            1                2  
            2                2  
            3                2  
            4                3  
            5                3  
            6                3  
            7                3

Need to delete the duplicates, expected outcome:

id_specific_price    id_product  
-------------------------------
            3                2  
            7                3

SELECT * 
  FROM ps_specific_price 
 WHERE id_specific_price NOT IN 
 (SELECT MAX(id_specific_price) 
    FROM ps_specific_price 
   GROUP BY id_product) 

works but

DELETE FROM ps_specific_price 
 WHERE id_specific_price NOT IN 
(SELECT MAX(id_specific_price) 
   FROM ps_specific_price 
  GROUP BY id_product)

does not. There are plenty of examples to get around this but for some reason I am not able to adapt it. I believe it is GROUP BY. For example:

DELETE FROM ps_specific_price 
 WHERE id_specific_price NOT IN
 (SELECT MAX(p.id_specific_price) 
    FROM (SELECT * FROM ps_specific_price ) as p)
   GROUP BY id_product

Where did I go wrong here?

peterm
  • 91,357
  • 15
  • 148
  • 157
popkutt
  • 949
  • 3
  • 10
  • 19
  • You should not use the table you are deleting in your subquery. You should use a stored procedure with a cursor to archive this goal. – Cedric Simon Oct 02 '13 at 21:07
  • Stored procedures are the work of the devil. If it can't be solved in SQL (which it probably can, but might be vendor-specific), I'd suggest doing it at the application level (from Java or PHP, etc.) - select ...; delete from ... where id_specific_price in (...) – Brad Peabody Oct 02 '13 at 21:12
  • @CedricSimon Quoi? Your advice is (maybe) only valid for mysql, which does not like the deleted or updated table to be read as well in the same statement. (except if the reading reference does some aggregation, IIRC) – wildplasser Oct 02 '13 at 21:15
  • These aren't duplicate rows, you're deleting based on something subtler than that. You're asking to delete all but one row (the one with the highest id in one column) from a set grouped on a second id column. – Mark Oct 02 '13 at 21:20

3 Answers3

13

If you're looking for a solution for MySQL then you can use a proper multi table DELETE syntax along with a JOIN like this

DELETE p
  FROM ps_specific_price p JOIN
(
  SELECT id_product, MAX(id_specific_price) id_specific_price
    FROM ps_specific_price
   GROUP BY id_product
) d 
   ON p.id_product = d.id_product
  AND p.id_specific_price <> d.id_specific_price;

Outcome:

| ID_SPECIFIC_PRICE | ID_PRODUCT |
|-------------------|------------|
|                 3 |          2 |
|                 7 |          3 |

Here is SQLFiddle demo

peterm
  • 91,357
  • 15
  • 148
  • 157
1

Try this:

CREATE TABLE ps_specific_price (
  id_specific_price NUMBER,
  id_product NUMBER
);

INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (1, 2);
INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (2, 2);
INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (3, 2);
INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (4, 3);
INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (5, 3);
INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (6, 3);
INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (7, 3);

COMMIT;

DELETE FROM ps_specific_price ps
  WHERE ps.id_specific_price NOT IN (
    SELECT MAX(id_specific_price)
      FROM ps_specific_price ps_in
    WHERE ps_in.id_product = ps.id_product
    );

SELECT * FROM ps_specific_price;

ID_SPECIFIC_PRICE      ID_PRODUCT             
---------------------- ---------------------- 
3                      2                      
7                      3                      

You must connect the table from the inner query with the table from the outer one.

I'm using Oracle 11g R2. I checked this on SQLFiddle and my DELETE statement is invalid for MySQL - don't have that one installed and not much experience there, but you didn't say what database you are using.

Przemyslaw Kruglej
  • 8,003
  • 2
  • 26
  • 41
  • Thank you. I forgot to mention that there are thousands of rows, these are just first 7. So I need to find the duplicates first and then delete them – popkutt Oct 02 '13 at 21:30
  • Yes, but you are "looking" for them using the subquery, no matter how many there are. Please tell us what database are you using. Also, you should check this link: http://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows?rq=1 – Przemyslaw Kruglej Oct 02 '13 at 21:32
0

May be you wanted to try :

DELETE FROM ps_specific_price WHERE (id_product,id_specific_price) NOT IN (SELECT id_product,MAX(id_specific_price) FROM ps_specific_price GROUP BY id_product);

I tried this on my Teradata database and it works.

In your earlier query the only thing I see you miss is mapping id_product and it's max price with delete set. the delete will not know that it has to match pid and price before deleting.

Hope this helps.

  • Thank you. It doesn't work on my MySQL database. What is Teradata database? – popkutt Oct 02 '13 at 21:32
  • its a databases. If its on My Sql you could join id_product in the sub query with id_product from outer delete table. – kishore krv Oct 02 '13 at 21:38
  • @popkutt: you did not mention mysql (which is **very restricted** in references to the table being updated or deleted) – wildplasser Oct 02 '13 at 21:56
  • DELETE FROM ps_specific_price outT WHERE id_specific_price NOT IN (SELECT MAX(Subq.id_specific_price) FROM ps_specific_price subq WHERE outT.id_product=subq.id_product);Hope this clarifies. – kishore krv Oct 02 '13 at 22:04