0

I'm wondering whether this kind of logic would improve query performance, say for example rather then checking a user likes a post on each element in an array and firing a query for each.

Instead i could push the primary id's into an array and then perform an IN query on them, this would reduce 15 nth term queries, and batch it into 2 query including the initial one.

I'm using PHP PDO, MYSQL.

Any advice? Am i on the right track people? :D

$items is the result set from the database, in this case they are questions that users are asking, i get a response in about 140ms and i've set a limit on how many items are loaded at once with pagination.

    $questionIds = [];

    foreach ($items as $item) {
        array_push($questionIds, $item->question_id);
    }

    $items = loggedInUserLikesQuestions($questionIds, $items, $user_id);
  • 3
    A `QUERY` especially a simple `IN` query will almost always be less overhead. Do yourself a favor, and create a static array with 100 elements and do the `loop` vs `query` experiment ... Benchmark it yourself and the answer will be plain. – Zak Jan 10 '18 at 17:41
  • `IN` will be best for cases needing to modify data (setting questions as liked by user). However, if your goal is simply to create a list of questions which the user has previously liked, it is very likely that the most efficient option would be a SINGLE query, with joins as necessary. **I.E.:** `SELECT q.id, q.name FROM question q LEFT JOIN user_likes ul ON q.id = ul.quest_id AND ul.user_id = 123` – Tony Chiboucas Jan 10 '18 at 18:35
  • Thanks Tony, we have already done this, it's been in production for a while now, and i'm just optimising the performance and so far so good! I have rewrote SQL the query, and i'm using IN on everything that needs to be computed on top now - i've used PHP to map the computed values onto each object property where a user liked or reported a question, the stored procedure before all of this counts the likes, and user reports. – Kieran Grosvenor Jan 21 '18 at 00:02

1 Answers1

2

Definitely the IN clause is faster on execution of the SQL query. However, you will only see significant actual clock-speed benefits once the number of items in your IN clause (on average) gets high.

The reason there is a speed difference, even though the individual update may be lightning-fast, is the setup, executing, tear-down, and response of each query, send/receive to the server. When you are doing thousands (or millions) of these as fast as you can, I've seen, instead of 500/sec, getting 200,000/sec. This may give you some idea.

However, with the IN-clause method, you need to make sure your IN clause does not become too big, and hitting the max query size (see variable max_allowed_packet)

Here is a simple set of functions that will automatically batch up into IN clauses of 1000 items each:

<?php

$db = new PDO('...');
$__q = [];
$flushQueue = function() use ($db, &$__q) {
    if ( count($__q) > 0 ) {
        $sanitized_ids = [];
        foreach ( $__q as $id ) { $sanitized_ids[] = (int) $id; }
        $db->query("UPDATE question SET linked = 1 WHERE id IN (". join(',',$sanitized_ids) .")");
        $__q = [];
    }
};
$queuedUpdate = function($question_id) use (&$__q, $flushQueue){
    $__q[] = $question_id;
    if ( count( $__q) > 1000 ) { $flushQueue(); }
};


// Then your code...
foreach ($items as $item) {
    $queuedUpdate($item->question_id);
}
$flushQueue();

Obviously, you don't have to use anon functions, if you are in a class. But the above will work anywhere (assuming you are on >= PHP 5.3).

  • Thanks for the answer - very helpful! Yeah i kept my IN size to just 25, it runs two additional IN queries side by side, and then maps the computed values back to the array of object where it matches it's correct conditions, i'm seeing performance improvements locally - on the live server either was fast anyway but i guess the server has had a bit of a health check up at the doctors! – Kieran Grosvenor Jan 21 '18 at 00:05