25

I have a table of games, which is described as follows:

+---------------+-------------+------+-----+---------+----------------+
| Field         | Type        | Null | Key | Default | Extra          |
+---------------+-------------+------+-----+---------+----------------+
| id            | int(11)     | NO   | PRI | NULL    | auto_increment |
| date          | date        | NO   |     | NULL    |                |
| time          | time        | NO   |     | NULL    |                |
| hometeam_id   | int(11)     | NO   | MUL | NULL    |                |
| awayteam_id   | int(11)     | NO   | MUL | NULL    |                |
| locationcity  | varchar(30) | NO   |     | NULL    |                |
| locationstate | varchar(20) | NO   |     | NULL    |                |
+---------------+-------------+------+-----+---------+----------------+

But each game has a duplicate entry in the table somewhere, because each game was in the schedules for two teams. Is there a sql statement I can use to look through and delete all the duplicates based on identical date, time, hometeam_id, awayteam_id, locationcity, and locationstate fields?

cfrederich
  • 1,790
  • 6
  • 26
  • 35

9 Answers9

45

You should be able to do a correlated subquery to delete the data. Find all rows that are duplicates and delete all but the one with the smallest id. For MYSQL, an inner join (functional equivalent of EXISTS) needs to be used, like so:

delete games from games inner join 
    (select  min(id) minid, date, time,
             hometeam_id, awayteam_id, locationcity, locationstate
     from games 
     group by date, time, hometeam_id, 
              awayteam_id, locationcity, locationstate
     having count(1) > 1) as duplicates
   on (duplicates.date = games.date
   and duplicates.time = games.time
   and duplicates.hometeam_id = games.hometeam_id
   and duplicates.awayteam_id = games.awayteam_id
   and duplicates.locationcity = games.locationcity
   and duplicates.locationstate = games.locationstate
   and duplicates.minid <> games.id)

To test, replace delete games from games with select * from games. Don't just run a delete on your DB :-)

N West
  • 6,768
  • 25
  • 40
  • I tried the select version of this, and it looked like exactly what I wanted to get rid of, but when I actually ran it with "delete from," it threw an error and told me "Error Code: 1093. You can't specify target table 'game' for update in FROM clause" Any ideas? – cfrederich Jun 24 '11 at 18:15
  • Try the updated answer, I replaced EXISTS with an INNER JOIN delete. I think mysql might be having trouble with a delete and EXISTS clause. – N West Jun 26 '11 at 13:35
13

You can try such query:

DELETE FROM table_name AS t1
WHERE EXISTS (
 SELECT 1 FROM table_name AS t2 
 WHERE t2.date = t1.date 
 AND t2.time = t1.time 
 AND t2.hometeam_id = t1.hometeam_id 
 AND t2.awayteam_id = t1.awayteam_id 
 AND t2.locationcity = t1.locationcity 
 AND t2.id > t1.id )

This will leave in database only one example of each game instance which has the smallest id.

Grigor Gevorgyan
  • 6,753
  • 4
  • 35
  • 64
7

The best thing that worked for me was to recreate the table.

CREATE TABLE newtable SELECT * FROM oldtable GROUP BY field1,field2;

You can then rename.

Ali Hashemi
  • 3,158
  • 3
  • 34
  • 48
  • 2
    This is by far the best and more straight forward solution. You can't go wrong using this. – Codex73 Jan 30 '17 at 21:42
  • 1
    The one downside to this is that you lose the constraints, but you can just `TRUNCATE` oldtable and copy everything back into it from newtable, so it works like a charm – Hissvard Apr 06 '17 at 12:14
  • 1
    Most secure solution, much better than a DELETE statement, IMO. – that-ben Jun 15 '17 at 14:41
5

To get list of duplicate entried matching two fields

select t.ID, t.field1, t.field2
from (
  select field1, field2
  from table_name
  group by field1, field2
  having count(*) > 1) x, table_name t
where x.field1 = t.field1 and x.field2 = t.field2
order by t.field1, t.field2

And to delete all the duplicate only

DELETE x 
FROM table_name x
JOIN table_name y
ON y.field1= x.field1
AND y.field2 = x.field2
AND y.id < x.id;
Rem
  • 81
  • 1
  • 3
  • The above query does expected but it removes the last row from the resultset. So I made a correction in the query as follows: DELETE x FROM table_name x JOIN table_name y ON y.field1= x.field1 AND y.field2 = x.field2 AND y.id > x.id; – Vinayagam Feb 03 '17 at 13:51
4
select orig.id,
       dupl.id
from   games   orig, 
       games   dupl
where  orig.date   =    dupl.date
and    orig.time   =    dupl.time
and    orig.hometeam_id = dupl.hometeam_id
and    orig. awayteam_id = dupl.awayeam_id
and    orig.locationcity = dupl.locationcity
and    orig.locationstate = dupl.locationstate
and    orig.id     <    dupl.id

this should give you the duplicates; you can use it as a subquery to specify IDs to delete.

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
3

AS long as you are not getting id (primary key) of the table in your select query and the other data is exact same you can use SELECT DISTINCT to avoid getting duplicate results.

Wicked Coder
  • 1,128
  • 6
  • 8
2
delete from games 
   where id not in 
   (select max(id)  from games 
    group by date, time, hometeam_id, awayteam_id, locationcity, locationstate 
    );

Workaround

select max(id)  id from games 
    group by date, time, hometeam_id, awayteam_id, locationcity, locationstate
into table temp_table;

delete from games where id in (select id from temp);
piotrpo
  • 12,398
  • 7
  • 42
  • 58
  • 1
    This approach would only delete one duplicate row per game, regardless of how many duplicate rows existed for the game. – Fred Sobotka Jun 24 '11 at 18:14
  • This gives me the same error that I got from @Neville K's post. ERROR 1093 (HY000): You can't specify target table 'game' for update in FROM clause – cfrederich Jun 24 '11 at 20:56
  • Sometimes you can not update values if it changes resultset. I've added workaround for that. – piotrpo Jun 25 '11 at 20:23
1
DELETE FROM tbl
 USING tbl, tbl t2
 WHERE tbl.id > t2.id
  AND t2.field = tbl.field;

in your case:

DELETE FROM games
 USING games tbl, games t2
 WHERE tbl.id > t2.id
  AND t2.date = tbl.date
  AND t2.time = tbl.time
  AND t2.hometeam_id = tbl.hometeam_id
  AND t2.awayteam_id = tbl.awayteam_id
  AND t2.locationcity = tbl.locationcity
  AND t2.locationstate = tbl.locationstate;

reference: https://dev.mysql.com/doc/refman/5.7/en/delete.html

iateadonut
  • 1,951
  • 21
  • 32
1
DELETE FROM table
WHERE id = 
    (SELECT t.id
    FROM table as t
    JOIN (table as tj ON (t.date = tj.data
                          AND t.hometeam_id = tj.hometeam_id
                          AND t.awayteam_id = tj.awayteam_id
                          ...))
limscoder
  • 3,037
  • 2
  • 23
  • 37