1

I've been trying to structure a massive query, and I have succeeded and been able to actually finish the query. However I went from my dev environment (small database) to testing on the live environment (big database), and I've ran into performance problems.

I think the answer can be found here: https://dba.stackexchange.com/a/16376

But is there really no other way around? The reason I am even putting the subqueries in a VIEW is because they have more complex constructs.

Example of the VIEWS / queries:

pjl view:

    (SELECT `pj`.`id` AS `id`,`pj`.`globalId` AS `globalId`,`pj`.`date` AS `date`,`pj`.`serverId` AS `serverId`,`pj`.`playerId` AS `playerId`,'playerjoins' AS `origin`
    FROM `playerjoins` `pj`) 
    UNION ALL 
    (SELECT `pl`.`id` AS `id`,`pl`.`globalId` AS `globalId`,`pl`.`date` AS `date`,`pl`.`serverId` AS `serverId`,`pl`.`playerId` AS `playerId`,'playerleaves' AS `origin`
    FROM `playerleaves` `pl`)

ll_below view:

    SELECT `ll`.`id` AS `id`,`ll`.`globalId` AS `globalId`,`ll`.`date` AS `date`,`ll`.`serverId` AS `serverId`,`ll`.`gamemodeId` AS `gamemodeId`,`ll`.`mapId` AS `mapId`,`pjl`.`origin` AS `origin`,`pjl`.`date` AS `pjldate`,`pjl`.`playerId` AS `playerId`
    FROM `pjl`
    JOIN `levelsloaded` `ll` 
    ON `pjl`.`date` <= `ll`.`date`

the, now simple, query:

    SELECT * FROM
    (
            (SELECT * FROM ll_below WHERE playerId = 976) llbelow
            INNER JOIN
            (SELECT id, MAX(pjldate) AS maxdate FROM ll_below WHERE playerId = 976 GROUP BY id) llbelow_inner
            ON llbelow.id = llbelow_inner.id AND llbelow.pjldate = llbelow_inner.maxdate
    )
    WHERE origin = 'playerjoins'
    ORDER BY date DESC

I could put everything in one big query, but in my eyes it gets a big mess then.

I also know why the performance is being hit so hard, because MySQL cannot use the MERGE algorithm for the pjl view as there is an UNION ALL in it. If I put the WHERE playerId = 976 clauses in the correct places, then the performance hit is gone, but I'd also have a query consisting of 50 lines or something.

Can someone please suggest me what to do if I want performance ánd a query that is still concise?

Community
  • 1
  • 1
skiwi
  • 66,971
  • 31
  • 131
  • 216
  • Can you also provide some `show create table ` outputs and `explain` outputs? and maybe some example data on http://www.sqlfiddle.com? – Raymond Nijland Nov 29 '13 at 23:50

2 Answers2

1

This clause:

WHERE origin = 'playerjoins'

Means that you didn't need to do a UNION at all, since you're not using any of the rows from pl by the end of the query.

You're right that the view is likely forcing a temporary table instead of using the merge algorithm.

UNION ALL also creates its own temporary table. This case is optimized in MySQL 5.7.3 (still pre-alpha as of this writing), according to Bug #50674 Do not create temporary tables for UNION ALL.

Also, the GROUP BY is probably creating a third level of temporary table.

I see you're also doing a greatest-n-per-group operation, to match the rows with the max date per id. There are different solutions for this type of operation, which don't use a subquery. See my answers for example:

Depending on the number of rows and other conditions, I've seen both solutions for greatest-n-per-group queries give better performance. So you should test both solutions and see which is better given the state and size of your data.

I think you should unravel the views and unions and subqueries. See if you can apply the various WHERE conditions (like playerId=976) directly against the base tables before doing joins and aggregates. That should greatly reduce the number of examined rows, and avoid the multiple layers of temp tables caused by the view and union and group by.


Re your comment:

The query you seem to want is the most recent join per level for one specific player.

Something like this:

SELECT ll.id, 
  ll.globalId,
  ll.date AS leveldate,
  ll.serverId,
  ll.gamemodeId,
  ll.mapId,
  pj.date AS joindate,
  pj.playerId
FROM levelsloaded AS ll
INNER JOIN playerjoins AS pj
  ON pj.date <= ll.date
LEFT OUTER JOIN playerjoins AS pj2
  ON pj.playerId = pj2.playerId AND pj2.date <= ll.date AND pj.date < pj2.date 
WHERE pj.playerId = 976
  AND pj2.playerID IS NULL
ORDER BY joindate DESC

(I have not tested this query, but it should get you started.)

Community
  • 1
  • 1
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • I appreciate your explanation, but as I've mentioned, wouldn't my query then be like 50 lines large? I have tried the approach without views before, and then the queries got so unreadable that I couldn't figure out the answer anymore. The most bothersome part is that I had like 20 lines of simply duplicated code, because there was no other way. – skiwi Nov 29 '13 at 21:51
  • One other thing I noticed that didn't see in his query... He has no criteria listed joining the levelsLoaded to a specific player... its just on date unless its by the "GlobalID" that appears in both, but still unclear. – DRapp Nov 29 '13 at 22:39
  • @DRapp, I assume there is no direct reference to the player's join/leave action. It's not an equijoin. It joins the player's action to *all* levels that were available on the respective date. – Bill Karwin Nov 29 '13 at 22:48
  • @BillKarwin, I agree to that, but if you have 10 people playing and each are at different levels, then which level is associated with player 976. – DRapp Nov 29 '13 at 23:00
  • @DRapp, we don't know what "level" means in the OP's game. It could be that a "level" is not an attribute of a player, but an area of the game world they are allowed to have access to. That would be consistent with the `levelsloaded` table having its own `date` column, which may be the date that each respective level was first introduced to the game world. – Bill Karwin Nov 30 '13 at 00:07
0

Bill is absolutely correct... your views don't even really provide any benefit. I've tried to build something for you, but my interpretation may not be exactly correct. Start by asking yourself IN SIMPLE WORDS what am I trying to get. Here is what I came up with.

I'm looking for a single player (hence your player ID = 976). I'm also only considering the PLAYERJOINS instance (not the player leaving which knocks out the union part). For this player, I want the most recent date they joined a game. From that date as the baseline, I want all Levels Loaded that were created at or after the maximum date joined.

So, the first query is nothing but the maximum date for player 976 from the playerJoined table. Who cares about anything else, or any other user. The ID Here is the same as it would be in the LevelsLoaded table via the join, so getting that player ID and the same levelsLoaded ID for the same person is, IMO, pointless. Then, get the rest of the details from the Levels Loaded on/after the max date for the same person, order by whatever..

If my interpretation of your query is incorrect, offer obvious clarification for adjustments.

SELECT 
      ll.id,
      ll.globalId,
      ll.`date`,
      ll.serverId,
      ll.gamemodeId,
      ll.mapId,
      'playerjoins' as origin,
      playerMax.MaxDate AS pjldate
   FROM 
      ( SELECT MAX( pj.`date` ) as MaxDate
           FROM playerjoins pj 
           where pj.id = 976 ) playerMax
         JOIN levelsloaded ll 
            ON ll.id = 976
           AND playerMax.MaxDate <= ll.`date`
DRapp
  • 47,638
  • 12
  • 72
  • 142
  • Keep in mind the `GROUP BY ll.id` in the OP's final query, which means he wants the max join date per level. Also, I don't think he ever wants ll.id = 976. Finally, this is trivia, but `date` is not a [MySQL reserved word](http://dev.mysql.com/doc/refman/5.6/en/reserved-words.html), so there's no need to delimit it with back-ticks. – Bill Karwin Nov 29 '13 at 22:24
  • 1
    @BillKarwin, thanks, and you are right, but as confusing as his own views were based, it might still help him to think out-loud what in the world does he want :) – DRapp Nov 29 '13 at 22:30