0

I have an array:

[1, 2, 3, 5]

And I have a table:

+--------+
| number |
+--------+
| 1      |
| 2      |
| 3      |
| 4      |
| 5      |
+--------+

What I want to do is compare the two and delete the difference from the table. So in this case, 4.

What is the best way to do this? (I'm using MySQL(i) and php)

The following methods will work, but they have downsides:

  • Iterating through the array and manually adding each item to the IN part of the WHERE clause:

    DELETE FROM tbl WHERE number NOT IN (1, 2, 3, 5)
    

    Issue is this is a potential risk with unsafe strings and sql queries have size limits.

  • Creating a temporary table with the array and using that to make the comparison:

    DELETE FROM tbl WHERE number NOT IN (SELECT num FROM tmp)
    

    This is very resource intensive (creating and dropping a new table for each connection) and slow.

Jans Rautenbach
  • 394
  • 4
  • 13
  • 3
    What is the size limit on a SQL statement? I've not encountered this, even with fairly large arrays. And - you don't have to iterate the array, you can use `implode` OR - better would be to use a prepared statement, such as [this answer](https://stackoverflow.com/a/29848974/870729) - which would prevent the unsafe string issue you are concerned with. – random_user_name Jan 29 '18 at 21:30
  • @cale_b the answer in that link is sort of genius. Thank you for sharing that. – JNevill Jan 29 '18 at 21:33
  • Adding to this, so long as your temp tables aren't that large or frequently created you can also specify `MEMORY` as the storage engine to speed up their performance/lifecycle. – Sammitch Jan 29 '18 at 21:38
  • The sql parameter limit is [apparently](https://stackoverflow.com/a/1869789/3900981) between 50k and 60k. My tables aren't that big right now, but it's good to future proof. – Jans Rautenbach Jan 29 '18 at 21:56
  • 1
    Define 'best' . – Strawberry Jan 29 '18 at 22:20
  • The limit is set by max_allowed_packet – guigoz Jan 29 '18 at 23:03

2 Answers2

0

A simple way is:

DELETE FROM tbl WHERE NOT FIND_IN_SET(number, ?)

Then bind your array as string '1,2,3,5'

This can't use an index, but neither can NOT IN (1,2,3,5).

If your array is too big to be bound as string, then I would ask where this big amount of data come from.

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
0

DELETE FROM tbl WHERE number NOT IN (SELECT num FROM tmp)

is not slow because of create/drop, but because of

  • lack of index
  • NOT IN condition

You can do instead, providing that number is indexed

delete tbl.*
from
  tbl
  left join tmp using (number)
where tmp.number is null

This is not resource intensive nor limited by max_allowed_packet

guigoz
  • 674
  • 4
  • 21
  • Shouldn't it be `tbl left join tmp where tmp.number is null`? – Paul Spiegel Jan 30 '18 at 00:09
  • I still don't like the idea of creating and dropping a new table for each connection. I think what I'm going to do is rather set up a permanent "comparison table" and use a connection ID so that multiple instances of the script can operate on the table without interfering with each other. – Jans Rautenbach Jan 30 '18 at 11:39
  • That will make you do some `delete` before `insert` some new rows with connection id, plus take the risk of table locking if using MyISAM. Might not be better – guigoz Jan 30 '18 at 11:54