1

I'm really new to php and MYSQL, i knew nothing about either a month ago, so please forgive my sloppy/poor code :)

I have the following code within my PHP:

$starttime = microtime(true);
$q_un = 'SELECT i.id AS id
            FROM items i 
            WHERE i.id NOT IN (SELECT item_id FROM purchased_items WHERE user_id=' . $user_id . ')';
$r_un = mysqli_query($dbc, $q_un);
if (mysqli_num_rows($r_un) > 0) {
while ($row_un = mysqli_fetch_array($r_un, MYSQLI_ASSOC)) {
    $item_id = $row_un['id'];
    $q_rec = 'INSERT INTO compatibility_recommendations (
                `recommendation`,
                `user_id`,
                `item_id`)
                SELECT
                    ((SUM(a.rating*(a.compat-80)))/(SUM(a.compat-80)))*10 AS rec,
                    a.user_id AS user_id,
                    a.item_id AS item_id
                FROM
                    (SELECT r.rating AS rating, 
                        c.user2_id AS rater, 
                        c.user1_id AS user_id, 
                        c.compatibility AS compat, 
                        r.item_id AS item_id 
                    FROM ratings r
                    RIGHT JOIN compatibility_ratings c ON r.user_id=c.user2_id
                    WHERE c.user1_id=' . $user_id . ' AND r.item_id=' . $item_id . ' AND c.compatibility>80) a
                ON DUPLICATE KEY UPDATE
                    recommendation = VALUES(recommendation)';
    $r_rec = mysqli_query($dbc, $q_rec);
}
}
$endtime = microtime(true);
$duration = $endtime - $starttime;</code>

The first query selects a list of items that the current user, $user_id, hasn't purchased yet. I then run a while loop on each row (item) that is returned, performing the main query within this loop.

This next query is taking info from the ratings table where the item_id is equal to the current item_id which is being queried, and joins it to a precomputed user compatibility table with a right join.

I then run arithmetic on the ratings and compatibility ratings to form a recommendation value, and then insert the recommendation, item_id and user_id into another table to be called later. There's a 2 column unique key on the (item_id,user_id) columns, hence the ON DUPLICATE KEY UPDATE at the end

So i wrote this code this morning and was quite happy with myself as it does exactly what i need it to do.

The problem is that, predictably, it's slow. On my test DB, with 5 test users and 100 test items and a random assortment of 200 ratings, it's taking 2.5 seconds to run through the while loop. I was expecting it to be slow, but not this slow. it's really going to struggle once more users and items are added. The main problem is on the insert...on duplicate key update part, my disk utilisation goes to 100% and i can tell my laptop's HDD is seeking like crazy. I know I will probably use SSDs in production, but I would still anticipate a major scale issue with thousand of items and users.

So my main question here is: can anyone give any advice on how to optimise my code, or completely rejig things to improve speed. I'm sure that the insert query within a while loop is a poor way of doing this, i just can't think of any other way to obtain the exact same results

Thanks in advance and sorry if i formatted my question incorrectly

fetef
  • 15
  • 3
  • 1
    **WARNING:** When using `mysqli` you should be using parameterized queries and [`bind_param`](http://php.net/manual/en/mysqli-stmt.bind-param.php) to add user data to your query. **DO NOT** use string interpolation to accomplish this because you will create severe [SQL injection bugs](http://bobby-tables.com/). – tadman Nov 06 '14 at 21:16
  • 2
    @fetef FWIW: I've seen worse code from month old programmers ;) – webnoob Nov 06 '14 at 21:16
  • If you're new to PHP you should be starting with a [development framework](http://codegeekz.com/best-php-frameworks-for-developers/) like [Laravel](http://laravel.com/) that fits your style and needs. Smashing away with super low-level code isn't very productive. – tadman Nov 06 '14 at 21:17
  • If you use a php framework, your queries will be optimized by it's built in database functions most of the time. For beginners, [Codeigniter](http://www.codeigniter.com/user_guide/) is very easy to learn and implement. – Angelo Nov 06 '14 at 21:19
  • 1
    @Angelo - I have yet to see a framework that will optimize database queries. You'll usually see a drop in performance from the abstraction library. – Sam Dufel Nov 06 '14 at 21:21
  • Yeah sorry, i have started migrating to bind_param, I did my first learning from some old books that pre-dated mysqli. @tadman, forgive me for my ignorance, but what do these frameworks do? – fetef Nov 06 '14 at 21:30
  • I would recommend reading up on strategies for creating indexes on your database tables, and then using mysql EXPLAIN to evaluate the usage of any indexes you create. Proper table indexing can provide huge performance gains on the mysql side, which is crucial for scaling. – John McMahon Nov 06 '14 at 21:41
  • @fetef The right framework will do all the heavy lifting for you and leave you free to implement your custom business logic. Plus they provide a foundation for adding third-party libraries that add significant functionality at near zero effort. Slapping together your own app without adhering to any particular conventions really limits what you can do. – tadman Nov 06 '14 at 22:02
  • @fetef Don't worry. You're way ahead of the curve! That said, you don't need the loop. One query will suffice! – Strawberry Nov 07 '14 at 12:23

3 Answers3

0

I found the answer that I was looking for here

The second query for each item was taking 0.002 seconds for just the select, but then 0.06 seconds with the insert, so i profiled the query and found that "query-end" was taking 99% of the query time. I've set innodb_flush_log_at_trx_commit = 0, but the comments to that answer frown upon it. I don't use transactions however, so are there any consequences/alternatives to this approach? It did reduce my while loop time from 2.5 seconds to 0.08 seconds.

Community
  • 1
  • 1
fetef
  • 15
  • 3
  • You'll probably find its the sub-select that's really eating the time up try to replace the subselects with joins on both your insert and your primary select loop and you'll see execution time drop especially if combined with properly configured index's on the tables. – Dave Nov 07 '14 at 11:52
  • Any advice on how to rearrange the query/ which columns should be indexed? – fetef Nov 07 '14 at 11:58
  • index anything used as a join point or a where clause search point so for starters r.user_id, c.user2_id, c.user1_id, r.item_id etc look through your query to get the rest of them. then try executing the same query again see if its any faster after that then start looking at rewriting the sub selects to joins etc. – Dave Nov 07 '14 at 12:08
  • For example your first select could be rewritten to something like this `SELECT i.id AS id FROM items i INNER JOIN purchased_items ON i.id=purchased_items.item_id WHERE purchased_items.user_id NOT $user_id` an inner join will only select items where there's an entry in both tables so you may need to do a different type of join syntax will remain pretty similar though just join type will change. – Dave Nov 07 '14 at 12:11
0
$starttime = microtime(true);
$q_un = "

 INSERT INTO compatibility_recommendations 
 (recommendation
 ,user_id
 ,item_id
 )
 SELECT ((SUM(a.rating*(a.compat-80)))/(SUM(a.compat-80)))*10 rec
      , a.user_id 
      , a.item_id 
   FROM
      ( SELECT r.rating rating
             , c.user2_id rater
             , c.user1_id user_id
             , c.compatibility compat
             , r.item_id 
          FROM compatibility_ratings c
          JOIN ratings r
            ON r.user_id = c.user2_id

          JOIN items i
            ON i.id = r.item_id

          LEFT
          JOIN purchased_items p
            ON p.item_id = i.id
           AND p.user_id = $user_id

         WHERE c.user1_id =  $user_id
           AND c.compatibility > 80
           AND p.item_id IS NULL
      ) a
 GROUP BY a.item_id
 ON DUPLICATE KEY UPDATE recommendation = VALUES(recommendation);

 ";

$r_rec = mysqli_query($dbc, $q_rec);
}
}
$endtime = microtime(true);
$duration = $endtime - $starttime;</code>

For any further improvement, we'd really need to see proper DDLs AND the EXPLAIN for the SELECT above.

fetef
  • 15
  • 3
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • This is the correct answer, shaved the query time from 2.5 seconds to 0.08 by saving the constant seeking of the while loop. I'm sure i can get time down further with index optimisation, but I think that is a journey I should take myself, to educate myself. But thank you very much to @Strawberry for the query unification, I had stared at this one for 10 hours and couldn't see it. Bravo – fetef Nov 08 '14 at 10:02
  • ;-) That's not really 'shaving' is it? More like the guillotine! – Strawberry Nov 08 '14 at 15:38
-1

See https://stackoverflow.com/a/14456661/2782404

fetch_assoc may be significantly faster than fetch_array, and you should be fetching all at once before you access the values.

Community
  • 1
  • 1
kwills
  • 116
  • 10
  • He is already passing in MYSQLI_ASSOC for the optional 2nd parameter in mysqli_fetch_array() to return the results as an associative array only. – John McMahon Nov 06 '14 at 21:45