Deleting duplicate entries with search criteria

Question

I have table like

table_id  item_id  vendor_id  category_id
   1         1       33           4
   2         1       33           4 
   3         1       33           2 
   4         2       33           4
   5         2       33           2
   6         3       33           4
   7         3       33           4 
   8         1       34           4 
   9         1       34           4 
   10        3       35           4

Here table_id is primary key and table having total 98000 entries including 61 duplicate entries which I found by executing query

 SELECT * FROM my_table 
 WHERE vendor_id = 33 
 AND category_id = 4 
 GROUP BY item_id having count(item_id)>1

In above table table_id 1,2 and 6,7 duplicate. I need to delete 2 and 7 from my table( Total 61 Duplicate Entries). How can I delete duplicate entries from my table using query with where clause vendor_id = 33 AND category_id = 4 ? I don't want delete other duplicate entries such as table_id 8,9

I cannot index the table, since I need to kept some duplicate entries which required. I need to delete duplicate with certain criteria

I found this: http://stackoverflow.com/questions/3385661/deleting-duplicate-records/3386958#3386958 — Kevin, Aug 06 '13 at 07:01
In my table having 98000 entries, So when executing sub queries, its not working — Shafeeque, Aug 06 '13 at 07:03
http://stackoverflow.com/questions/2867530/how-to-remove-duplicate-entries-from-a-mysql-db this may help you — , Aug 06 '13 at 07:05
ironic enough, your post is a duplicate entry... http://stackoverflow.com/questions/3311903/remove-duplicate-rows-in-mysql — NDM, Aug 06 '13 at 07:23
I dont think rows having table_id 8 and 9 are duplicates, they have different item_id — Ayyappan Sekar, Aug 06 '13 at 07:25
@NickyDeMaeyer : I don't want delete all duplicate entries and I cannot index it. Since it already having duplicate entries which required. — Shafeeque, Aug 06 '13 at 07:26
@Shafeeq - Since rows `table_id` 8 and 9 have different `item_id` how is this duplicate rows? — Himanshu, Aug 06 '13 at 07:30
@Shafeeq - You have mixed up your question. First you are saying I want to remove `table_id` 2 and 7. Then you are saying you don't want to remove other duplicate entries such as `table_id` 7,8. Please show your expected result after deletion. — Himanshu, Aug 06 '13 at 07:35
@hims056 , I was trying solve the issue same time in my system and checking Stackoverflow for any answers. — Shafeeque, Aug 06 '13 at 07:39

Himanshu · Accepted Answer · 2013-08-06T07:42:21.800

Please always take backup before running any deletion query.

Try using LEFT JOIN like this:

DELETE my_table
  FROM my_table
  LEFT JOIN 
  (SELECT MIN(table_id) AS IDs FROM my_table
   GROUP BY `item_id`, `vendor_id`, `category_id`
  )A
  ON my_table.table_id = A.IDs
  WHERE A.ids IS NULL;

Result after deletion:

| TABLE_ID | ITEM_ID | VENDOR_ID | CATEGORY_ID |
------------------------------------------------
|        1 |       1 |        33 |           4 |
|        3 |       1 |        33 |           2 |
|        4 |       2 |        33 |           4 |
|        5 |       2 |        33 |           2 |
|        6 |       3 |        33 |           4 |

See this SQLFiddle

Edit: (after OP's edit)

If you want to add more conditions, you can add it in outer WHERE condition like this:

DELETE my_table
  FROM my_table
  LEFT JOIN 
  (SELECT MIN(table_id) AS IDs FROM my_table
   GROUP BY `item_id`, `vendor_id`, `category_id`
  )A
  ON my_table.table_id = A.IDs
  WHERE A.ids IS NULL
  AND vendor_id = 33   --< Additional conditions here
  AND category_id = 4  --< Additional conditions here

See this SQLFiddle

score 0 · Answer 2 · answered Aug 06 '13 at 07:07

0

What about this:

DELETE FROM my_table
WHERE table_id NOT IN
    (SELECT MIN(table_id)
     FROM my_table
     GROUP BY item_id, vendor_id, category_id)

answered Aug 06 '13 at 07:07

Rob

919
7
16

Sorry, I thought about mssql and not mysql. – Rob Aug 06 '13 at 07:40

score 0 · Answer 3 · answered Aug 06 '13 at 07:26

0

try below code...
DELETE FROM myTable
      WHERE table_ID NOT IN (SELECT   MAX (table_ID)
                           FROM myTable
                       GROUP BY table_ID
                         HAVING COUNT (*) > 1)

answered Aug 06 '13 at 07:26

SeeSharp

179
1
4
12

score 0 · Answer 4 · answered Aug 06 '13 at 07:57

Try

DELETE m 
  FROM my_table m JOIN 
(
  SELECT MAX(table_id) table_id
    FROM my_table 
   WHERE vendor_id = 33 
     AND category_id = 4 
   GROUP BY item_id, vendor_id, category_id 
  HAVING COUNT(*) > 1
) q ON m.table_id = q.table_id

After delete you'll have

| TABLE_ID | ITEM_ID | VENDOR_ID | CATEGORY_ID |
------------------------------------------------
|        1 |       1 |        33 |           4 |
|        3 |       1 |        33 |           2 |
|        4 |       2 |        33 |           4 |
|        5 |       2 |        33 |           2 |
|        6 |       3 |        33 |           4 |
|        8 |       1 |        34 |           4 |
|        9 |       1 |        34 |           4 |
|       10 |       3 |        35 |           4 |

Here is SQLFiddle demo

score -1 · Answer 5 · answered Aug 06 '13 at 07:16

From your Question, I guess you need to remove the duplicate rows which has same values for the item_id,vendor_id and category_id like the rows having tabled_id 1 and 2. So it can be done by making the mentioned three columns unique together. So try the following,

alter ignore table table_name add unique index(item_id, vendor_id, category_id);

Note: I didnt test this yet, Will give sqlfiddle in sometime

Deleting duplicate entries with search criteria

5 Answers5

Please always take backup before running any deletion query.

See this SQLFiddle

Edit: (after OP's edit)

See this SQLFiddle