0

I have a table with ~300,000 records in it and I would like to select the latest record for each player and order it by rating. The following query gives me the desired results but it takes ~100s to run on the table with 300,000 records and the table will grow to many millions of records.

SELECT *
FROM players a
JOIN (
    SELECT name, server, Max(timestamp) AS MaxTimeStamp
    FROM players
    GROUP BY name, server
) b
ON a.name = b.name
AND a.server = b.server
AND a.timestamp = b.MaxTimeStamp
ORDER BY score desc

Execution plan looks like this:

id  select_Type table           type    possible_keys  key      key_length  ref                             rows    extra
1   PRIMARY     <derived2>      ALL                                                                         268683  Using temporary; Using filesort
1   PRIMARY     a               eq_ref  PRIMARY        PRIMARY  147         b.MaxTimeStamp,b.server,b.name  1   
2   DERIVED     players_temp    index                  PRIMARY  147                                         264813  Using index; Using temporary; Using filesort

Primary keys are 'timestamp', 'server', 'name'. 'score' is indexed Let me know if there is any more information that could help. Thanks very much!

EDIT

Create table statement for how the table currently is (i've change it a number of times!) is as follows:

    CREATE TABLE `players` (
      `timestamp` datetime NOT NULL,
      `server` varchar(25) NOT NULL,
      `name` varchar(20) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
      `normName` varchar(20) DEFAULT NULL,
      `position` varchar(20) DEFAULT NULL,
      `team` varchar(10) DEFAULT NULL,
      `won` smallint(6) DEFAULT NULL,
      `lost` smallint(6) DEFAULT NULL,
      `score` smallint(6) DEFAULT NULL,
      `picture` varchar(100) DEFAULT NULL,
      PRIMARY KEY (`timestamp`,`server`,`name`),
      KEY `normName` (`normName`) USING BTREE,
      KEY `score` (`score`) USING BTREE,
      KEY `playerRecord` (`timestamp`,`server`,`name`) USING BTREE
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

innodb_buffer_pool_size has been set to 1GB, still seeing poor performance. The subquery alone takes 20 seconds to run. I did an explain on it:

EXPLAIN 
SELECT name, server, Max(timestamp) AS MaxTimeStamp
FROM players_temp
GROUP BY name, server

result:

1   SIMPLE  players_temp    index       rating2v2   3       265910  Using index; Using temporary; Using filesort

EDIT 2

I got a bit further by altering the GROUP BY in my subquery. I changed it from 'GROUP BY name, server' to 'GROUP BY timestamp, server, name' to match the order of the keys and now its using index instead of filesort temporary table, but there is still a problem and it seems to be ordering by score.

If I run this query:

SELECT *
FROM players_temp a
ORDER BY a.score

it takes 33 seconds to run and EXPLAIN indicates it's using filesort. There is a BTREE index on score, as indicated by above CREATE statement. Any ideas? This is the last hurdle, I believe, to getting the query to run properly. Thanks again

Stu
  • 13
  • 4
  • if (name, server, timestamp) are my primary key cluster, do I also need to add index for them? or for each of them separately or one index with all 3? Yes, I only need the columns from `a`. `b` is just used to find the max timestamp. – Stu Jan 31 '14 at 21:42
  • I did try that earlier and didn't see any decrease, I've since removed it (since I've messed around with so many different key/index combinations and rebuilt the table a few times). I'll add it back now! – Stu Jan 31 '14 at 21:45
  • Ok, that's interesting because I added a clustered index (timestamp, server, name) and the query time went down to 50s from 80s, even though those columns were already pk. I'll name the columns I want returned and see how that works. – Stu Jan 31 '14 at 21:50
  • Specifying the columns didn't make much different. 49s vs 50s – Stu Jan 31 '14 at 21:52
  • 1
    I think we need to see your CREATE table statement. – Strawberry Jan 31 '14 at 22:14
  • I'am thinking that if you are going to scale this to millions of records, you will have to rely on pre-calculate some data (on the application level, or may be with a trigger on update/insert), or run the query on some kind of scheduler and cache the results (on the application level). – Diego Jan 31 '14 at 22:51
  • yes, a friend of mine also suggested this as well (periodically run a script to write the sorted data into a new table). I will edit the original post with create table statement shortly. Thanks very much for the responses! – Stu Feb 01 '14 at 01:00
  • Sorry it took so long, I didn't get a chance to do it last night but I have now added the create table statement to original post. Thanks again! – Stu Feb 01 '14 at 13:41

2 Answers2

0

There is nothing wrong with a little selective denormalization of your data. If you've gone as far as you can with indexes & sub query logic, you could add a column that indicates which record is the latest for each player then add an index that includes that new column.

If your app knows which record is newest at the time of writing the record (i.e. the one being written is by definition the latest?) this is easy to do.

And if for some reason that can't be done, you could have a separate process examine the data from time to time and flag the latest records.

Complete normalization is great in terms of data integrity, but data eventually gets large enough that the trade off isn't worth it. All the big online services (Facebook, Instagram etc) are heavily denormalized.

Chris Johnson
  • 20,650
  • 6
  • 81
  • 80
  • I'll mark this accepted answer as I believe I've got as far as I can with the query. Thanks for everyone's help – Stu Feb 02 '14 at 17:01
-2

There is no much else I can think of doing to get that result more efficiently. The subquery seems well suited because it shrinks the resultset and it's not dependant on columns of the main query.

Remember to have an index on the columns name, server and timestamp.

If this does not work with acceptable performance, you can try to run the subquery alone with EXPLAIN.

If it ends using filesort at some point, probably the keys don't fit the available memory. You can try to enlarge key_buffer_size if you are using MyISAM as engine, or innodb_buffer_pool_size if the table uses InnoDB, on the mysql configuration file. Please note that if your table keeps growing, at some point you will have the same problem again and you will have to look for another solution (or buy more memory, hehe).

Diego
  • 682
  • 1
  • 7
  • 17
  • 1
    I will try this, but I need to grab the entire record, and Im afraid that mysql will take the max(timestamp) but the other columns won't necessarily belong to the column with the max(timestamp). I'm going by the answers to a similar question here: [link](http://stackoverflow.com/questions/14770671/mysql-order-by-before-group-by) – Stu Jan 31 '14 at 21:14
  • Ok, yeah, the behavior of that is not specified on the manual. Let me update the answer... – Diego Jan 31 '14 at 22:23
  • There I edited the post. Also check the details on performance of that query. – Diego Jan 31 '14 at 22:35
  • How is this an improvement? – Strawberry Jan 31 '14 at 22:37
  • I just saw that I ended up doing almost the same as the query in the original question. Anyways the performance comments still hold up. Let me update the answer again... I voted up also to you comment to see the CREATE code. – Diego Jan 31 '14 at 22:41
  • innodb_buffer_pool_size - 1073741824 Should that be enough? I'm not really sure – Stu Feb 01 '14 at 15:38