4

I searched but all of the questions come up with delete duplicate record and keep only one latest in table, but my is a bit difference, I want to delete duplicate record in the same group but keep only latest record of that group. Here is an example:

ID  Device_ID    Time                   Content Text
1   abc          2013-11-30 15-30-01    I love you
2   abc          2013-11-30 15-30-01    I love you
3   def          2013-10-30 12-12-02    I love you
4   def          2013-10-30 12-12-02    I love you
5   ghj          2013-09-30 11-12-02    I love you
6   ghj          2013-09-30 11-12-02    I love you

So there are some devices having same content with other device, I would like find a way to delete duplicate content in each device id and delete duplicate and keep the latest.

If I select group by content text then delete it from there then it would delete all others content from other device as well and I can't do that.

Assume I have 50 million records. So is there a way I can delete duplicate record on each device id the result would be like only having I love you content text on each device id.

I can't make my column content text as unique during inert because there might be some device id using same content text. Or maybe unique time but then each device id might insert same time but difference content text...

I would like res

ID  Device_ID    TimeContent            Text

2   abc          2013-11-30 15-30-01    I love you

4   def          2013-10-30 12-12-02    I love you

6   ghj          2013-09-30 11-12-02    I love you
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Jimmy Le
  • 69
  • 1
  • 9
  • Don't give up. This question must have been answered thousands of times - in SO and elsewhere. (Although, if you really can't figure it out then proper DDLs and/or an sqlfiddle TOGETHER WITH THE DESIRED RESULT SET would probably be useful) – Strawberry Dec 12 '13 at 10:11
  • can you explain more ? – Jimmy Le Dec 12 '13 at 10:12
  • This is an answer very similar to your case: http://stackoverflow.com/questions/4075430/how-do-i-remove-duplicates-rows-in-my-mysql-database-keep-the-one-with-lowest The whole trick is to list proper columns in the `GROUP BY` clause and use MAX instead of MIN (for latest records). – krokodilko Dec 12 '13 at 10:27
  • The link you post is bit situation than mine, i would like to delete duplicate message in column text and time but under group of each device id . So only delete content text if the time is duplicate and text is duplicate but also unique in each device_id so each device id can have same time and content with other device_id – Jimmy Le Dec 12 '13 at 10:29

3 Answers3

4

Try:

DELETE FROM tblA 
WHERE id NOT IN (
          SELECT DISTINCT id FROM (
          SELECT  MAX(id) as id FROM tblA 
          GROUP BY Device_ID, Time, ContentText 
          HAVING COUNT(*) > 1 ) A          
    );

See Demo

Edper
  • 9,144
  • 1
  • 27
  • 46
  • This query gives an error: `You can't specify target table 'tbl' for update in FROM clause`, paste your query here: http://www.sqlfiddle.com/#!2/7b705/1 and click a `Build` button. – krokodilko Dec 12 '13 at 10:52
  • right it didn't work, in my table , the only unique row is ID, and primary key is Device ID. So if a user want to see the content , it only show distinct value of the content AND time, so if there is a same content with same time, it won't show duplicate but rather only 1 . So each user attached with a device id. So there are million records mix up together but some device_id have same record content + time and i would like to remove that but only under that device_id and so apply to the rest of table...device_id abc and cdf has notthing to do with it. – Jimmy Le Dec 12 '13 at 10:56
  • @Edper yes, now it's working fine: http://www.sqlfiddle.com/#!2/0cc86/1 `DISTINCT` did the trick - I didn't know that `DISTINCT` can materialize a subquery, thanks. – krokodilko Dec 12 '13 at 11:06
  • @kordirko: Yes. But you know what without the distinct and my original query works in MS SQL Server but of course the OPs database is mySQL. MySQL is playing safe making sure that no duplicate columns in this ID would exist in subquery but the GROUP BY is virtually like DISTINCT as well. ;-) – Edper Dec 12 '13 at 11:08
  • @Edper the query without `DISTINCT` in the subquery works on almost all databases (PostGreSQL, Oracle, SQL Server etc) ... but MySql doesn't allow for queries that selects and updates (update/delete) the same table in one query --> see this link:http://dev.mysql.com/doc/refman/5.0/en/delete.html - `Currently, you cannot delete from a table and select from the same table in a subquery. `. The trick is to materialize a subquery (save it into temp table) using `( SELECT * FROM ( subquery ))`, and - as in your query - with a `DISTINCT` (I didn't know that). – krokodilko Dec 12 '13 at 11:15
  • @Edper thank you :) need to test around – Jimmy Le Dec 12 '13 at 11:29
  • @Edper using ur query...i found some problem , similar to my db http://www.sqlfiddle.com/#!2/0d810/1 it delete record that not even duplicate for example id 14, 15,5, – Jimmy Le Dec 12 '13 at 17:47
3

Try this:

DELETE FROM TableA  
WHERE id NOT IN (SELECT id 
                 FROM(SELECT MAX(id) id FROM TableA t
                      GROUP BY t.Device_ID, t.Time, t.ContentText HAVING COUNT(*) > 1
                     ) AS A       
                );

Check the SQL FIDDLE DEMO

OUTPUT

| ID | DEVICE_ID |                TIME | CONTENTTEXT |
|----|-----------|---------------------|-------------|
|  2 |       abc | 2013-11-30 15-30-01 |  I love you |
|  4 |       def | 2013-10-30 12-12-02 |  I love you |
|  6 |       ghj | 2013-09-30 11-12-02 |  I love you |
| 13 |      ghj1 | 2013-09-30 11-12-02 |  I love you |
Saharsh Shah
  • 28,687
  • 8
  • 48
  • 83
1

NVM i found the problem just switch Count * > 1 to 0 , it will not delete the non duplicate

Jimmy Le
  • 69
  • 1
  • 9