1

I am looking to delete via a subselect. The answers for this question do not solve my issue: MySQL Error 1093 - Can't specify target table for update in FROM clause.

I found a hackish way to do this and wanted to get feedback on if this was the most optimal solution for doing a DELETE with SUBSELECT --

DELETE FROM main_history WHERE id NOT IN (
   SELECT * from (
     SELECT max(id) FROM main_history h GROUP BY instance_id, xpath
   ) x
)

Basically, I just add in a second (meaningless) subselect to get around the mysql error, which I receive if I just do a normal DELETE with the subselect:

DELETE FROM main_history WHERE id NOT IN (
  SELECT max(id) FROM main_history h GROUP BY instance_id, xpath
);

You can't specify target table 'main_history' for update in FROM clause

Is this a good solution to the above issue?

Update: The subselect query performs much better than the join query. Here were my findings:

# 12m Subselect
DELETE FROM main_history WHERE id NOT IN (
 SELECT * from (
  SELECT max(id) FROM main_history h GROUP BY instance_id, xpath
 ) x
)

# 18m Join
Delete m1 from main_history m1 left join
(
  SELECT max(id) id FROM main_history h GROUP BY instance_id, xpath
) m2 on m1.id = m2.id
Where m2.id is null;
Community
  • 1
  • 1
David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    What was the error? – Anand Jan 05 '17 at 20:25
  • @Anand see updated question pls. – David542 Jan 05 '17 at 20:27
  • You can also add a new boolean field to the table: "newest_in_group" (with index). It would be true when you add a new entry, and also you set this value to false for all other entries where instance_id, xpath are the same (those could have a composite index). Then you can delete with a logarithmic query. – Crouching Kitten Jan 05 '17 at 20:41
  • @CrouchingKitten yea I was previously storing a checksum to check for duplicate data, etc, but removed it was a bit tricky as we have both new inserts and updates, so it's a lot of overhead to keep track of it all. – David542 Jan 05 '17 at 21:04

2 Answers2

3

Right way is using delete join:

Delete m1 from main_history m1 left join
(
  SELECT max(id) id FROM main_history h GROUP BY instance_id, xpath
) m2 on m1.id = m2.id
Where m2.id is null;
David542
  • 104,438
  • 178
  • 489
  • 842
Gurwinder Singh
  • 38,557
  • 6
  • 51
  • 76
  • want to benchmark the two different approaches using some fake data to see which performs better? – David542 Jan 05 '17 at 20:31
  • what would be the difference in the subselect approach vs. the join approach? – David542 Jan 05 '17 at 20:33
  • I would bet this is the most efficient method. – Anand Jan 05 '17 at 20:33
  • 1
    @Anand it's not -- see my updated question. But at least this answer offers a different approach, so it was nice to be able to test both out to see which one is faster (subselect) and cleaner (join). – David542 Jan 06 '17 at 19:47
0

You can use a derived table like this, then MySQL cant see the tablename from the SELECT:

DELETE FROM main_history WHERE id NOT IN (
   SELECT * FROM (
      SELECT * from (SELECT max(id) FROM main_history h
      GROUP BY instance_id, xpath) AS x
   ) AS delrows
);
Bernd Buffen
  • 14,525
  • 2
  • 24
  • 39
  • What would be the difference between deleting this way and then deleting via a join, as in the other answer? – David542 Jan 05 '17 at 20:31