2

I need to solve the following task: I have a quite large array of IDs in PHP script and I need to select from MySQL DB all rows with IDs NOT IN this array.

There are several similar questions (How to find all records which are NOT in this array? (MySql)) and the most favourite answer is use NOT IN () construction with implode(',',$array) within a brackets.

And this worked... until my array gown up to 2007 IDs and about 20 kB (in my case) I've got a "MySQL server has gone away" error. As I can understand this is because of the lengthy query.

There are also some solutions to this problem like this:

SET GLOBAL max_allowed_packet=1073741824;

(just taken from this question).

Probably I could do it in this way, however now I doubt that NOT IN (implode) approach is a good one to a big arrays (I expect that in my case array can be up to 8000 IDs and 100 kB).

Is there any better solution for a big arrays?

Thanks!

EDIT 1

As a solution it is recommended to insert all IDs from array to a temporary table and than use JOIN to solve the initial task. This is clear. However I never used temporary tables and therefore I have some additional question (probably worth to be as a separate question but I decided to leave it here):

If I need to do this routine several times during one MySQL session, which approach will be better:

  1. Each time I need to SELECT ID NOT IN PHP array I will create a NEW temporary table (all those tables will be deleted after MySQL connection termination - after my script will be terminated in fact).

  2. I will create a temporary table and delete one after I made needed SELECT

  3. I will TRNCATE a temporary table afterwards.

Which is the better? Or I missed something else?

Community
  • 1
  • 1
Vlada Katlinskaya
  • 991
  • 1
  • 10
  • 26
  • make anoter table in bd and select id which not in it – splash58 Jul 14 '15 at 07:36
  • Load the data into a dummy (indexed) table, and then use an OUTER JOIN – Strawberry Jul 14 '15 at 07:38
  • @splash58 thought of that too. Is it OK considering that the array is a dymanic one? So I should create some kind of temporary table (never used this thing so far) – Vlada Katlinskaya Jul 14 '15 at 07:38
  • 1
    Regarding EDIT1: There are several things that come into play here. If your server is not memory constrained you can leave the temporary table as-is until your session terminates. This way you can re-use the data already in the temporary table if you need to re-execute the query. On the other hand, if you know for sure that you'll not be running any queries against the temporary table you could drop it. This does have a slight performance impact as your script will have to wait for the table to be dropped instead of it being dropped _after_ your script already finished. – vhu Jul 14 '15 at 08:21

2 Answers2

3

In such cases it is usually better to create a temporary table and perform the query against it instead. It'd be something along the lines of:

CREATE TEMPORARY TABLE t1 (a int);

INSERT INTO t1 VALUES (1),(2),(3);

SELECT * FROM yourtable 
 LEFT JOIN t1 on (yourtable.id=t1.a)
 WHERE t1.a IS NULL;

Of course INSERT statement should be constructed so that you'd insert all values from your array into the temporary table.

Edit: Inserting all values in a single INSERT statement would most probably lead into the same problem you already faced. Hence I'd suggest that you use a prepared statement that will be executed to insert the data into temporary table while you iterate through the PHP array.

vhu
  • 12,244
  • 11
  • 38
  • 48
  • Thanks! Is it correct that if I will insert a whole table in one query I will face the same *MySQL server has gone away* problem? – Vlada Katlinskaya Jul 14 '15 at 07:51
  • Yes, you'd probably run into the same problem. For that reason I'd suggest that you use a [prepared statement](http://php.net/manual/en/pdo.prepare.php) and call it repeatedly instead. Whether you use PDO or mysqli doesn't really matter. – vhu Jul 14 '15 at 07:53
  • You can use transactions if there will be. – Anton Jul 14 '15 at 07:53
  • Is it OK to insert **one** ID per one execution of prepared statement in prospective of performance? – Vlada Katlinskaya Jul 14 '15 at 07:55
  • Probably yes, but I don't know how rest of your queries and application work so ymmv. – vhu Jul 14 '15 at 07:58
  • If your data allows it, you should consider whether it is possible to store those IDs permanently in the database. Then you would just join the tables and maintain the ID list as necessary. If list of IDs is mainly static for each user then consider having a table with User/ID pairs in permanent table. If they are static per session, then you might be able to get away with table that has SessionID and ID. – vhu Jul 14 '15 at 08:01
1

I've once had to tackle this problem, but with a IN(id) WHERE Clause with approx 20,000-30,000 identifiers (indexes).

The way I got around this, with SELECT query, was that I reduced the number of filtered identifiers and increased the number of times I sent the same query, in order to extract the same data.

You could use array_chunk for PHP and divide 20,000 by 15, which would give you 15 separate SQL Calls, filtering records by 1500 identifiers (per call, you can divide more than 15 to reduce the number of identifiers further). But in your case, if you just divide 2007 idenitifers by 10 it would reduce the number of identifiers you're pushing to the database to 200 per SQL request, there are otherways to optimize this further with temporary tables and so fourth.

By dividing the number of indexes you're trying filter it will speed up each query, to run faster than if you were to send every index to the database in a single dump.

classicjonesynz
  • 4,012
  • 5
  • 38
  • 78