0

I have table like

table_id  item_id  vendor_id  category_id
   1         1       33           4
   2         1       33           4 
   3         1       33           2 
   4         2       33           4
   5         2       33           2
   6         3       33           4
   7         3       33           4 
   8         1       34           4 
   9         1       34           4 
   10        3       35           4 

Here table_id is primary key and table having total 98000 entries including 61 duplicate entries which I found by executing query

 SELECT * FROM my_table 
 WHERE vendor_id = 33 
 AND category_id = 4 
 GROUP BY item_id having count(item_id)>1

In above table table_id 1,2 and 6,7 duplicate. I need to delete 2 and 7 from my table( Total 61 Duplicate Entries). How can I delete duplicate entries from my table using query with where clause vendor_id = 33 AND category_id = 4 ? I don't want delete other duplicate entries such as table_id 8,9

I cannot index the table, since I need to kept some duplicate entries which required. I need to delete duplicate with certain criteria

Himanshu
  • 31,810
  • 31
  • 111
  • 133
Shafeeque
  • 2,039
  • 2
  • 13
  • 28
  • I found this: http://stackoverflow.com/questions/3385661/deleting-duplicate-records/3386958#3386958 – Kevin Aug 06 '13 at 07:01
  • In my table having 98000 entries, So when executing sub queries, its not working – Shafeeque Aug 06 '13 at 07:03
  • http://stackoverflow.com/questions/2867530/how-to-remove-duplicate-entries-from-a-mysql-db this may help you –  Aug 06 '13 at 07:05
  • ironic enough, your post is a duplicate entry... http://stackoverflow.com/questions/3311903/remove-duplicate-rows-in-mysql – NDM Aug 06 '13 at 07:23
  • 1
    I dont think rows having table_id 8 and 9 are duplicates, they have different item_id – Ayyappan Sekar Aug 06 '13 at 07:25
  • @NickyDeMaeyer : I don't want delete all duplicate entries and I cannot index it. Since it already having duplicate entries which required. – Shafeeque Aug 06 '13 at 07:26
  • @Shafeeq - Since rows `table_id` 8 and 9 have different `item_id` how is this duplicate rows? – Himanshu Aug 06 '13 at 07:30
  • @hims056 Sorry, I have edited my question – Shafeeque Aug 06 '13 at 07:32
  • @Shafeeq - You have mixed up your question. First you are saying I want to remove `table_id` 2 and 7. Then you are saying you don't want to remove other duplicate entries such as `table_id` 7,8. Please show your expected result after deletion. – Himanshu Aug 06 '13 at 07:35
  • @hims056 , I was trying solve the issue same time in my system and checking Stackoverflow for any answers. – Shafeeque Aug 06 '13 at 07:39
  • @Shafeeq - Don't worry now see my updated answer. – Himanshu Aug 06 '13 at 07:42

5 Answers5

1

Please always take backup before running any deletion query.

Try using LEFT JOIN like this:

DELETE my_table
  FROM my_table
  LEFT JOIN 
  (SELECT MIN(table_id) AS IDs FROM my_table
   GROUP BY `item_id`, `vendor_id`, `category_id`
  )A
  ON my_table.table_id = A.IDs
  WHERE A.ids IS NULL;

Result after deletion:

| TABLE_ID | ITEM_ID | VENDOR_ID | CATEGORY_ID |
------------------------------------------------
|        1 |       1 |        33 |           4 |
|        3 |       1 |        33 |           2 |
|        4 |       2 |        33 |           4 |
|        5 |       2 |        33 |           2 |
|        6 |       3 |        33 |           4 |

See this SQLFiddle


Edit: (after OP's edit)

If you want to add more conditions, you can add it in outer WHERE condition like this:

DELETE my_table
  FROM my_table
  LEFT JOIN 
  (SELECT MIN(table_id) AS IDs FROM my_table
   GROUP BY `item_id`, `vendor_id`, `category_id`
  )A
  ON my_table.table_id = A.IDs
  WHERE A.ids IS NULL
  AND vendor_id = 33   --< Additional conditions here
  AND category_id = 4  --< Additional conditions here

See this SQLFiddle

Himanshu
  • 31,810
  • 31
  • 111
  • 133
0

What about this:

DELETE FROM my_table
WHERE table_id NOT IN
    (SELECT MIN(table_id)
     FROM my_table
     GROUP BY item_id, vendor_id, category_id)
Rob
  • 919
  • 7
  • 16
0
try below code...
DELETE FROM myTable
      WHERE table_ID NOT IN (SELECT   MAX (table_ID)
                           FROM myTable
                       GROUP BY table_ID
                         HAVING COUNT (*) > 1)
SeeSharp
  • 179
  • 1
  • 4
  • 12
0

Try

DELETE m 
  FROM my_table m JOIN 
(
  SELECT MAX(table_id) table_id
    FROM my_table 
   WHERE vendor_id = 33 
     AND category_id = 4 
   GROUP BY item_id, vendor_id, category_id 
  HAVING COUNT(*) > 1
) q ON m.table_id = q.table_id

After delete you'll have

| TABLE_ID | ITEM_ID | VENDOR_ID | CATEGORY_ID |
------------------------------------------------
|        1 |       1 |        33 |           4 |
|        3 |       1 |        33 |           2 |
|        4 |       2 |        33 |           4 |
|        5 |       2 |        33 |           2 |
|        6 |       3 |        33 |           4 |
|        8 |       1 |        34 |           4 |
|        9 |       1 |        34 |           4 |
|       10 |       3 |        35 |           4 |

Here is SQLFiddle demo

peterm
  • 91,357
  • 15
  • 148
  • 157
-1

From your Question, I guess you need to remove the duplicate rows which has same values for the item_id,vendor_id and category_id like the rows having tabled_id 1 and 2. So it can be done by making the mentioned three columns unique together. So try the following,

alter ignore table table_name add unique index(item_id, vendor_id, category_id);

Note: I didnt test this yet, Will give sqlfiddle in sometime

Ayyappan Sekar
  • 11,007
  • 2
  • 18
  • 22