0

below is a query I use to get the latest record per serverID unfortunately this query does take endless to process. According to the stackoverflow question below it should be a very fast solution. Is there any way to speed up this query or do I have to split it up? (first get all serverIDs than get the last record for each server) Retrieving the last record in each group

SELECT s1.performance, s1.playersOnline, s1.serverID, s.name, m.modpack, m.color
    FROM stats_server s1 
    LEFT JOIN stats_server s2
        ON (s1.serverID = s2.serverID AND s1.id < s2.id)
    INNER JOIN server s
        ON s1.serverID=s.id
    INNER JOIN modpack m
        ON s.modpack=m.id
    WHERE s2.id IS NULL
    ORDER BY m.id
15 rows in set (34.73 sec)

Explain:

+------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra            |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+
|    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Impossible WHERE |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+
1 row in set, 1 warning (0.00 sec)

Sample Output:

+-------------+---------------+----------+---------------+-------------------------+--------+
| performance | playersOnline | serverID | name          | modpack                 | color  |
+-------------+---------------+----------+---------------+-------------------------+--------+
|          99 |            18 |       15 | hub           | Lobby                   | AAAAAA |
|          98 |            12 |       10 | horizons      | Horizons                | AA00AA |
|          97 |             6 |       11 | m_lobby       | Monster                 | AA0000 |
|          99 |             1 |       12 | m_north       | Monster                 | AA0000 |
|          86 |            10 |       13 | m_south       | Monster                 | AA0000 |
|          87 |            17 |       14 | m_east        | Monster                 | AA0000 |
|          98 |            10 |       16 | m_west        | Monster                 | AA0000 |
|          84 |             7 |        5 | tppi          | Test Pack Please Ignore | 55FFFF |
|          95 |            15 |        6 | agrarian_plus | Agrarian Skies          | 00AA00 |
|          98 |            23 |        7 | agrarian2     | Agrarian Skies          | 00AA00 |
|          74 |            18 |        9 | agrarian      | Agrarian Skies          | 00AA00 |
|          97 |            37 |       17 | agrarian3     | Agrarian Skies          | 00AA00 |
|          99 |            17 |        3 | bteam_pvp     | Attack of the B-Team    | FFAA00 |
|          73 |            44 |        8 | bteam_pve     | Attack of the B-Team    | FFAA00 |
|          93 |            11 |        4 | crackpack     | Crackpack               | EFEFEF |
+-------------+---------------+----------+---------------+-------------------------+--------+
15 rows in set (38.49 sec)

Sample Data:

http://www.mediafire.com/download/n0blj1io0c503ig/mym_bridge.sql.bz2

Community
  • 1
  • 1
user2693017
  • 1,750
  • 6
  • 24
  • 50
  • 3
    Post explain plan of your query `EXPLAIN SELECT s1.performance, s1.playersOnline, ....` – M Khalid Junaid Jul 23 '14 at 19:33
  • @MKhalidJunaid kay, added – user2693017 Jul 23 '14 at 19:35
  • 1
    I don't really see why you need to join the stats_server table to itself. You aren't referencing any columns, and your less than qualifier should be unnecessary when you have a WHERE IS NULL. Try removing the s2 join, making WHERE s1.id is NULL and see if that gives the same data back – WebChemist Jul 23 '14 at 19:43
  • I'm not sure about mysql, but in other databases, "WHERE s2.id IS NULL" would change your left join to an inner join. This is another reason to use Gordon's approach. – Dan Bracuk Jul 23 '14 at 19:45
  • @DanBracuk, @WebChemist, take a look at the link referenced in the question ([Retrieving the last record in each group](http://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group)) to see why there is a self-join and `WHERE ... IS NULL`. It's called an anti-join. – Marcus Adams Jul 23 '14 at 19:47
  • @WebChemist no results, did you look at the link I posted? – user2693017 Jul 23 '14 at 19:48
  • 2
    @user2693017 where you have added explain plan ? – M Khalid Junaid Jul 23 '14 at 19:50
  • it would be helpful if you could show some sample rows from your tables and the expected output. Or make a SQL fiddle – WebChemist Jul 23 '14 at 19:50
  • @WebChemist sample output and data added. As this is about performance and requires a lot of data I think it is better to test it in a local mysql setup. – user2693017 Jul 23 '14 at 19:57
  • How can I create a explain plan from command line, I couldn't find anything about it. – user2693017 Jul 23 '14 at 20:06
  • @MarcusAdams If I run one query to get all servers and than more to get the latest result for each server it is done in milliseconds, why doesn't this work with just one query? – user2693017 Jul 23 '14 at 20:09
  • @user2693017, see my answer. :) The key is to start with the `server` table. – Marcus Adams Jul 23 '14 at 20:21
  • Just a guess, but have you updated statistics recently on the involved tables? You can use [`OPTIMIZE TABLE`](http://dev.mysql.com/doc/refman/5.5/en/optimize-table.html) for this – PinnyM Jul 23 '14 at 20:36

4 Answers4

2

Edit

Ok I solved it. Here is expanded rows showing your original slow query:

enter image description here And here is a fast query using MAX() with GROUP BY that gives the identical results. Please try it for yourself.

SELECT       s1.id 
            ,s1.performance
            ,s1.playersOnline
            ,s1.serverID 
            ,s.name 
            ,m.modpack
            ,m.color
FROM        stats_server s1
JOIN        (
    SELECT      MAX(id) as 'id'
    FROM        stats_server 
    GROUP BY    serverID
            ) AS s2
ON          s1.id = s2.id
JOIN        server s
ON          s1.serverID = s.id
JOIN        modpack m 
ON          s.modpack = m.id
ORDER BY    m.id

enter image description here

WebChemist
  • 4,393
  • 6
  • 28
  • 37
  • where is the result different? It the is the exact same result just with older data – user2693017 Jul 23 '14 at 20:57
  • id and timestamp is both all right, so I assume id is faster – user2693017 Jul 23 '14 at 20:58
  • your query does miss the point why I use this complicated why of joining the same table. Take a look at the results, we get the oldest not the latest. – user2693017 Jul 23 '14 at 21:00
  • @WebChemist thanks it works perfect and only takes 2-8 milliseconds. Is there a reason you use JOINS instead of inner joins. I know it doesn't matter in this case, but are they faster? – user2693017 Jul 24 '14 at 09:23
  • 1
    @user2693017 - plain `JOIN` is synonymous with `INNER JOIN` in MySQL – PinnyM Jul 24 '14 at 14:41
  • Glad to help get that worked out, I knew there had to be a better way... `JOIN` is the same as `INNER JOIN` when used with an `ON` condition, and the same as a `CROSS JOIN` without an `ON` condition. http://dev.mysql.com/doc/refman/5.0/en/join.html – WebChemist Jul 24 '14 at 19:36
  • Another tip, if you name your related columns in both tables the same name, you can use `USING(col_name)` instead of `ON a.col_name = b.col_name`. I find it makes queries a lot easier to read. – WebChemist Jul 24 '14 at 19:40
1

I would phrase this query using not exists:

SELECT ss.performance, ss.playersOnline, ss.serverID, s.name, m.modpack, m.color
FROM stats_server ss INNER JOIN
     server s
     ON ss.serverID = s.id INNER JOIN
     modpack m
     ON s.modpack = m.id
WHERE NOT EXISTS (select 1
                  from stats_server ss2
                  where ss2.serverID = ss.serverID AND ss2.id > ss.id
                 )

Apart from the primary key indexes on server and modpack (which I assume are there), you also want an index on stats_server(ServerId, id). This index should also help your version of the query.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • they are already indexed. I have a hard time following your query I never used `WHERE NOT EXISTS` and it doesn't give back any results. Should I export some test data? – user2693017 Jul 23 '14 at 19:46
  • I agree with the index (still depends on MySQL execution plan), but I don't see how the `WHERE NOT EXISTS` will improve this one since it's a dependent subquery. – Marcus Adams Jul 23 '14 at 19:55
  • @MarcusAdams not every time exists is much faster than left join is null but sometime it takes over the join is much optimized [**`NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL`**](http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/) – M Khalid Junaid Jul 23 '14 at 20:00
  • @user2693017 . . . When you say "already indexed", do you mean that there is a composite index on `stats_server(ServerId, id)`, with the two columns in that order? The issue with not returning results was due to a typo that I just fixed (`s.id` versus `ss.id`). – Gordon Linoff Jul 23 '14 at 20:09
  • 15 rows in set (52.70 sec) – user2693017 Jul 23 '14 at 20:14
  • if you mean `INDEX stats_server(ServerId,id)` than yes – user2693017 Jul 23 '14 at 20:15
  • @user2693017 . . . And `server(id)` and `modpack(id)` are either primary keys or indexed, and the `id` columns match the types of the corresponding foreign keys? – Gordon Linoff Jul 23 '14 at 21:14
1

Am I missing something? Why wouldn't a standard uncorrelated subquery work?

SELECT x.id, x.performance, x.playersOnline, s.name, m.modpack, m.color, x.timestamp 
  FROM stats_server x 
  JOIN
     ( SELECT serverid, MAX(id) maxid FROM stats_server GROUP BY serverid ) y
    ON y.serverid = x.serverid AND y.maxid = x.id
  JOIN server s
    ON x.serverID=s.id
  JOIN modpack m
    ON s.modpack=m.id
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • 1
    thanks, btw. is it faster to use only Join or Inner Join, I mean it doesn't matter in this case. – user2693017 Jul 24 '14 at 09:15
  • 2
    I don't understand the question. A JOIN (with an ON clause) IS an INNER JOIN ! The INNER keyword is optional, and only serves to distinguish INNER JOINs from CROSS JOINs, when read by humans. There's probably something about compatability with the SQL standard, but I think the SQL standard is a myth. Oh, and woohoo - 8k (not that I'm counting) – Strawberry Jul 24 '14 at 09:40
0

I'm guessing that you really want this (notice the order of the joins and the join criteria), and this matches the indexes that you've created:

SELECT s1.performance, s1.playersOnline, s1.serverID, s.name, m.modpack, m.color
  FROM server s
  INNER JOIN stats_server s1
    ON s1.serverID = s.id
  LEFT JOIN stats_server s2
    ON s2.serverID = s.id AND s2.id > s1.id
  INNER JOIN modpack m
    ON m.id = s.modpack
  WHERE s2.id IS NULL
  ORDER BY m.id

MySQL doesn't always inner join the tables in the order that you write them in the query since the order doesn't really matter for the result set (though it can affect index use).

With no usable index specified in the WHERE clause, MySQL might want to start with the table with the least number of rows (maybe stats_server in this case). With the ORDER BY clause, MySQL might want to start with modpack so it doesn't have to order the results later.

MySQL picks the execution plan then sees if it has the proper index for joining rather than seeing what indexes it has to join on then picking the execution plan. MySQL doesn't just automatically pick the plan that matches your indexes.

STRAIGHT_JOIN tells MySQL in what order to join the tables so that it uses the indexes that you expect it to use:

SELECT s1.performance, s1.playersOnline, s1.serverID, s.name, m.modpack, m.color
  FROM server s
  STRAIGHT_JOIN stats_server s1
    ON s1.serverID = s.id
  LEFT JOIN stats_server s2
    ON s2.serverID = s.id AND s2.id > s1.id
  STRAIGHT_JOIN modpack m
    ON m.id = s.modpack
  WHERE s2.id IS NULL
  ORDER BY m.id

I don't know what indexes you've defined since you've not provided an EXPLAIN result or shown your indexes, but this should give you some idea on how to improve the situation.

JodyT
  • 4,324
  • 2
  • 19
  • 31
Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
  • thanks for the work putting into the explanation but unfortunately it has the same result: `15 rows in set (33.35 sec)` Looks like I'm forced to use one query per server. :( – user2693017 Jul 23 '14 at 20:29
  • Without specifying your schema, indexes, and explain plan, we're still just making educated guesses as to your problem. The strategy is sound, if you can find the issue. – Marcus Adams Jul 23 '14 at 20:33
  • mhh I exported the data and uploaded it. I could not find out how to create an explain plan via command line, if you know a tutorial let me know. Isn't there a way where we don't have to go through all records. I mean can't we do the stuff I do with two queries within one with a sub-query? It is only milliseconds to get the server list and than to get the row with the highest id for each.. – user2693017 Jul 23 '14 at 20:37
  • @user2693017 - you simply add the word `EXPLAIN` before your query when you execute it. See http://dev.mysql.com/doc/refman/5.5/en/explain.html – PinnyM Jul 23 '14 at 20:39
  • Make sure you're using the latest query in my answer. I did modify it later. Yes, you can use a subquery. There are subquery answers in the link that you posted also. – Marcus Adams Jul 23 '14 at 20:40
  • Also, after you have an EXPLAIN plan, please update statistics (see my comment above) and run EXPLAIN again. You might be pleasantly surprised (fingers crossed...) – PinnyM Jul 23 '14 at 20:42
  • @PinnyM I alread did that some time back and it is within the question. – user2693017 Jul 23 '14 at 20:47
  • Which did you already do: EXPLAIN, or statistics update? And did you run EXPLAIN using this solution? – PinnyM Jul 23 '14 at 20:49
  • Explain and optimize right now: 15 rows in set (34.44 sec) (Marcus latest query) – user2693017 Jul 23 '14 at 20:51
  • The 'Impossible Where' is particularly troubling since this should not be impossible... especially since rows are being returned. Until this is resolved, the optimizer can't use any indexes. If you remove the left join and the where - how long does that query take? – PinnyM Jul 23 '14 at 21:06