0

I'm trying to find exam results for individual people between multiple periods using this query:

SELECT * FROM RESULTS AS R, Define_Times AS T 
WHERE R.PERSONID = T.PERSONID AND ( 
(R.DATE BETWEEN T.Previous_Month_Start AND T.Previous_Month_End) OR 
(R.DATE BETWEEN T.Next_Month_Start AND T.Next_Month_End) OR 
(R.DATE BETWEEN T.Six_Month_Start AND T.Six_Month_End) OR 
(R.DATE BETWEEN T.One_Year_Start AND T.One_Year_End) OR 
(R.DATE BETWEEN T.Two_Year_Start AND T.Two_Year_End) OR 
(R.DATE BETWEEN T.Three_Year_Start AND T.Three_Year_End) OR 
(R.DATE BETWEEN T.Four_Year_Start AND T.Four_Year_End) )

Previous/Next/One_Year etc. is different for each person.

Explain gives:

| id | select_type | table | type | possible_keys | key  | key_len | ref             | rows  | Extra       |
|  1 | SIMPLE      | T     | ALL  | PEOPLE        | NULL | NULL    | NULL            | 75775 |             |
|  1 | SIMPLE      | R     | ref  | IDX3,IDX2     | IDX3 | 5       | T.PERSONID      |  3550 | Using where |

The Results table has about 300 million rows. Define_Times has 75,000.

It's taking AGES.

I see that the 1st type is ALL, which is bad. But if it's so bad, why is it not using the index on PERSONID (called PEOPLE) it identified as a possible? What can I do to improve this?

I also can't see it using an index for date - there's one on R.DATE. (It's the first in the sequence of 5 on the index called IDX2.)

Sorry for any typos - my keyboard is broken, and thanks in advance.

2 Answers2

2

The problem is all the conditions you have ORed together.

If possible, restructure your database so that Define_Time has only four columns:

 CREATE TABLE Define_Times (
    PersonID INTEGER,
    PeriodType SomeType,
    StartDate DATE,
    EndDate DATE )

Then, each person gets 7 records (or more, if there are more periods you're not searching for in your example) in which PeriodType indicates what period the dates specify (you might use text values like PM, NM, SM, 1Y, 2Y, 3Y, 4Y or you might use integer values pointing to a description in another table).

Then, rewrite your query like this:

SELECT * FROM RESULTS AS R, Define_Times AS T 
WHERE R.PERSONID = T.PERSONID 
   AND R.DATE BETWEEN T.StartDate AND T.EndDate
   AND T.PeriodType IN (PM,NM,SM,1Y,2Y,3Y,4Y)

This query is at least optimizable.

This query will produce one record per matched period for each person. If your periods do not overlap, that's fine (there will only ever be one matching record). If your periods do overlap and you only want one record per result set you'll need to do some additional work with DISTINCT or GROUP BY by aggregate the records in the result set.

Also, note that if you don't have any additional periods in the Define_Times table then you can remove the AND T.PeriodType part of the WHERE clause.

Larry Lustig
  • 49,320
  • 14
  • 110
  • 160
  • Cool - I'll try it and let you know. Thanks! Would I need a specific index to take advantage of this new query? – Dazed and Confused Sep 25 '13 at 16:17
  • 1
    An index starting with (PersonID, Date) on the table Results would be the best way to optimize. I'm not a MySQL expert and so don't know how well it will optimize this query, but at least it has the possibility of doing an optimization. – Larry Lustig Sep 25 '13 at 16:25
0

As a comparison, can you run this equivalent query

SELECT * FROM Define_Times AS T 
INNER JOIN RESULTS AS R on
(R.PERSONID = T.PERSONID and 
  ( 
  (R.DATE BETWEEN T.Previous_Month_Start AND T.Previous_Month_End) OR 
  (R.DATE BETWEEN T.Next_Month_Start AND T.Next_Month_End) OR 
  (R.DATE BETWEEN T.Six_Month_Start AND T.Six_Month_End) OR 
  (R.DATE BETWEEN T.One_Year_Start AND T.One_Year_End) OR 
  (R.DATE BETWEEN T.Two_Year_Start AND T.Two_Year_End) OR 
  (R.DATE BETWEEN T.Three_Year_Start AND T.Three_Year_End) OR 
  (R.DATE BETWEEN T.Four_Year_Start AND T.Four_Year_End) 
  ) 
)

I've seen the optimizer work much better at times in this form.

Also, since you OR all of the date between expressions, it pretty much has no way to use a date index, since any of the date ranges can satisfy the where clause.

EDIT -- ADDED

If you don't want to run the query, at least try comparing the estimated execution plans

Gary Walker
  • 8,831
  • 3
  • 19
  • 41
  • So basically change the WHERE to INNER JOIN? (Just to confirm that I haven't missed anything.) I can try it - but if it doesn't improve we won't know for a week or so! ;-) – Dazed and Confused Sep 25 '13 at 15:45
  • Just saw your edit - by execution plan, do you mean EXPLAIN? If so - exactly the same. – Dazed and Confused Sep 25 '13 at 15:46
  • Estimated plan means that the query is not actually run, but it provides a "best guess" of the actual execution plan. You might want to do this to query alternatives that are long running that you would prefer not to run against the database. – Gary Walker Sep 25 '13 at 16:18
  • My apologies, I read this as MSSQL, not MySql. Don't run enough MySql to recall if this is an option -- at work, no MySql avail to double check. – Gary Walker Sep 25 '13 at 16:25