6

Suppose you have a query like this...

SELECT T.TaskID, T.TaskName, TAU.AssignedUsers
FROM `tasks` T
    LEFT OUTER JOIN (
        SELECT TaskID, GROUP_CONCAT(U.FirstName, ' ',
            U.LastName SEPARATOR ', ') AS AssignedUsers
        FROM `tasks_assigned_users` TAU
            INNER JOIN `users` U ON (TAU.UserID=U.UserID)
        GROUP BY TaskID
    ) TAU ON (T.TaskID=TAU.TaskID)

Multiple people can be assigned to a given task. The purpose of this query is to show one row per task, but with the people assigned to the task in a single column

Now... suppose you have the proper indexes setup on tasks, users, and tasks_assigned_users. The MySQL Optimizer will still not use the TaskID index when joining tasks to the derived table. WTF?!?!?

So, my question is... how can you make this query use the index on tasks_assigned_users.TaskID? Temporary tables are lame, so if that's the only solution... the MySQL Optimizer is stupid.

Indexes used:

  • tasks
    • PRIMARY - TaskID
  • users
    • PRIMARY - UserID
  • tasks_assigned_users
    • PRIMARY - (TaskID,UserID)
    • Additional index UNIQUE - (UserID,TaskID)

EDIT: Also, this page says that derived tables are executed/materialized before joins occur. Why not re-use the keys to perform the join?

EDIT 2: MySQL Optimizer won't let you put index hints on derived tables (presumably because there are no indexes on derived tables)

EDIT 3: Here is a really nice blog post about this: http://venublog.com/2010/03/06/how-to-improve-subqueries-derived-tables-performance/ Notice that Case #2 is the solution I'm looking for, but it appears that MySQL does not support this at this time. :(

EDIT 4: Just found this: "As of MySQL 5.6.3, the optimizer more efficiently handles subqueries in the FROM clause (that is, derived tables):... During query execution, the optimizer may add an index to a derived table to speed up row retrieval from it." Seems promising...

BMiner
  • 16,669
  • 12
  • 53
  • 53
  • Can you also add the indexes you are using? I assume you have a PK on tasks and a non-unique index on tasks_assigned_users. – Luis Jan 18 '12 at 19:38
  • @Luis - edited the question for you :) – BMiner Jan 18 '12 at 19:42
  • You have GROUP BY Task ID, which implies multiple people could be working on a given task, which too implies some aggregation. Do you want all people assigned to a given task listed in a single return column associated with the task? Or, do you really want to see everyone assigned to a task, and those tasks unassigned, have it blank. Maybe even push any UNASSIGNED tasks to the top (or bottom) of the list... – DRapp Jan 18 '12 at 19:54
  • Multiple people can be assigned to a given task. The purpose of this query is to show one row per task, but with the people assigned to the task in a single column – BMiner Jan 18 '12 at 19:57
  • Similar question: http://stackoverflow.com/questions/1180714/how-can-i-further-optimize-a-derived-table-query-which-performs-better-than-the This person suggests using a temporary table, indexing it, and then running the query. This is lame. – BMiner Jan 18 '12 at 20:01

3 Answers3

4

There is a solution to this in MySQL Server 5.6 - the preview release (at the time of this writing).

http://dev.mysql.com/doc/refman/5.6/en/from-clause-subquery-optimization.html

Although, I'm not sure if the MySQL Optimizer will re-use indexes that already exist when it "adds indexes to the derived table"

Consider the following query:

SELECT * FROM t1 JOIN (SELECT * FROM t2) AS derived_t2 ON t1.f1=derived_t2.f1;

The documentation says: "The optimizer constructs an index over column f1 from derived_t2 if doing so would permit the use of ref access for the lowest cost execution plan."

OK, that's great, but does the optimizer re-use indexes from t2? In other words, what if an index existed for t2.f1? Does this index get re-used, or does the optimizer recreate this index for the derived table? Who knows?

EDIT: The best solution until MySQL 5.6 is to create a temporary table, create an index on that table, and then run the SELECT query on the temp table.

BMiner
  • 16,669
  • 12
  • 53
  • 53
  • 1
    the same stupid situation on MariaDB 10 (years after): even though group by in derived does speed up the query (ridiculous select v1 from t group by v1) because of explicit sorting, the best result is if you create all derived tables before the main query and explicitely add needed indexes. optimizer sux – Tertium Dec 07 '15 at 23:41
2

The problem I see is that by doing a subquery there is no underlying indexed table. If you are having a performance I'd do the grouping at the end, something like this:

SELECT T.TaskID, T.TaskName, GROUP_CONCAT(U.FirstName, ' ', U.LastName SEPARATOR ', ') AS AssignedUsers
FROM `tasks` T
    LEFT OUTER JOIN  `tasks_assigned_users` TAU ON (T.TaskID=TAU.TaskID)
    INNER JOIN `users` U ON (TAU.UserID=U.UserID)
GROUP BY T.TaskID, T.TaskName
Luis
  • 1,294
  • 7
  • 9
  • This works... but since it's the exact same resultset, I don't know why MySQL can't do this optimization for me. Plus, my *actual* query has about 20 columns; I'd have to tell MySQL to group all of them? I *really* only want to group the TaskID, not TaskID and TaskName... MySQL does extra work when you group by both columns. Ya know what I mean? – BMiner Jan 18 '12 at 19:59
  • Try telling and not telling; if they are needed, it is an error not to declare them so you will realize easily (I don't know if they are needed). I don't know why it can't do the optimization; my guess is that subqueries are a black box to the external query, but I don't know. Normally I avoid subqueries because of performance problems like this. – Luis Jan 18 '12 at 20:05
  • Is MySQL smart enough to know that GROUP BY T.TaskID, T.TaskName is the same as GROUP BY T.TaskID because T.TaskID is the PRIMARY key? – BMiner Jan 18 '12 at 20:10
  • No, it isn't. You can try for yourself if you `SET SESSION sql_mode = CONCAT("ONLY_FULL_GROUP_BY,",@@sql_mode)`. MySQL will complain about columns not being in the GROUP BY clause, even though they are functional dependent as defined by SQL. It will even complain when you just give a column another name using `AS`. – AndreKR Jan 18 '12 at 20:16
  • I didn't really want to re-write a ton of queries... and I'm not 100% sure on the performance of using GROUP BY on nearly every column, so I just used CREATE TEMPORARY TABLE `foo` **derived query goes here** followed by an ALTER TABLE ADD PRIMARY KEY ... etc. Thanks for your answer, @Luis! – BMiner Jan 19 '12 at 16:08
1

I'm afraid, it's not possible. You have to create a temporary table or a view to use an index.

AndreKR
  • 32,613
  • 18
  • 106
  • 168
  • That post is from 2006. There have been no changes since then? – BMiner Jan 18 '12 at 19:35
  • I don't think so, for two reasons. 1) It's MySQL. 2) There are other posts about this problem from 2010 for example (http://planet.mysql.com/entry/?id=23769). BTW, could you use a view? – AndreKR Jan 18 '12 at 19:43
  • I dunno... I've read that views have similar problems, but I'll give it a try right now... – BMiner Jan 18 '12 at 19:46
  • 1
    Ironically here's a post for MSSQL(!) suggesting quite the opposite (replacing temporary tables with derived tables) for performance reasons: http://www.sql-server-performance.com/2002/derived-temp-tables/ ;) – AndreKR Jan 18 '12 at 19:46
  • 1
    Unless you can reuse the index from another table, it may not be efficient to create an index. Creating an index is an O(n log n). If you invert the search and can use the Tasks PK, then you just have an O(n) iteration with an O(1) search for each. – Luis Jan 18 '12 at 19:49