Optimizing similar MySQL subqueries

Question

This is a subquery I have in a larger SQL script. It's performing the same action within multiple different CASE statements, so I was hoping I could somehow combine the action so it doesn't have to do the same thing over and over. However, I can't get the right results if I move the ORDER BY command outside of the CASE statements.

I'm joining 2 tables, met_data and flexgridlayers_table, on JDAY. Flexgridlayers_table has fields for JDAY and Segment, and met_data has fields JDAY, TAIR, and TDEW (in this simple example, but actually more fields). I'm running this through Matlab, so variable1 and variable2 are values set by a nested loop. I need to use CASE statements to account for the situation where variable1 is not equal to 1, then I want to output 0. Otherwise, I want to find values corresponding to a JDAY join, but the values may not be exact matches in F.JDAY and M.JDAY. I want to match on the closest <= value, so I use the ORDER BY M.JDAY DESC LIMIT 1 statement in each subquery.

The output is a table with fields JDAY (from F.JDAY), TAIR, and TDEW. Whenever I try moving the ORDER BY part outside of the CASE statements to get rid of the repeating subqueries, I get only a single row of results representing the largest JDAY. This query gives me the correct result - is there a way to optimize this?

SELECT F.JDAY, 
    CASE
        WHEN *variable1*<>1 THEN 0 
        ELSE 
            (SELECT M.TAIR 
            FROM met_data AS M
            WHERE M.Year=2000 AND M.JDAY<=F.JDAY
            ORDER BY M.JDAY DESC LIMIT 1)
    END AS TAIR,
    CASE
        WHEN *variable1*<>1 THEN 0 
        ELSE                
            (SELECT M.TDEW 
            FROM met_data AS M
            WHERE M.Year=2000 AND M.JDAY<=F.JDAY
            ORDER BY M.JDAY DESC LIMIT 1)
    END AS TDEW
FROM FlexGridLayers_table AS F
WHERE F.SEGMENT=*variable2*

Further explanation:

This query pulls all JDAY values from flexgridlayers_table, and then searches within the table met_data to find values corresponding to the closest <= JDAY values in that table. For example, consider the following flexgridlayers_table and met_data tables:

flexgridlayers_table:
Segment  JDAY
2        1.5
2        2.5
2        3.5
3        1.5
3        2.5
3        3.5

met_data:
JDAY  Year  TAIR  TDEW
1.0   2000  7     8
1.1   2000  9     10
1.6   2000  11    12
2.5   2000  13    14
2.6   2000  15    16
3.4   2000  17    18
4.0   2000  19    20

What I want (and what the query above returns) would be this, for variable1=1 and variable2=2:

JDAY  TAIR  TDEW
1.5   9     10
2.5   13    14
3.5   17    18

I'm just wondering if there is a more efficient way of writing this query, so I'm not performing the ORDER BY command on the same list of JDAY values over and over for each TAIR, TDEW, etc. field.

http://stackoverflow.com/questions/12925140/how-to-estimate-sql-query-timing/12925406#12925406 — felipsmartins, Nov 14 '12 at 00:31
This query takes 32 seconds with 5 subqueries and *variable1*=1. I'm not sure what else to take from your link... — Amy, Nov 14 '12 at 00:43
Actually, can you post the entire query and show a little more where the context of var1 and var2 are based... loop, just incrementing from one another, what... — DRapp, Nov 14 '12 at 02:11
The entire query is about 400 lines long...this is just a small piece. I'm looping through a grid (think X and Y coordinates)...variable1 can be thought of as Y and variable2 as X. Nested loops for both of these cycle through consecutive integers, hitting every grid point, with start and end points set by the user depending on the geometry of their system. — Amy, Nov 14 '12 at 02:15

score 2 · Accepted Answer · answered Nov 14 '12 at 02:57

Then I would write as follows... It looks like you are looking for one "TAIR" and "TDEW" per JDAY. If that is the case, apply a LEFT JOIN to your met_data table once on the year condition and F vs M JDay values. Now normally, this would return multiple rows per "JDay"

SELECT 
      PQ.JDay,
      PQ.MaxJDayPerFDay,
      CASE WHEN *var1* <> 1 THEN 0 ELSE M2.TAIR END TAIR,
      CASE WEHN *var1* <> 1 THEN 0 ELSE M2.TDEW END TDEW
   from 
      ( SELECT 
              F.JDay,
              MAX( M.JDAY ) as MaxJDayPerFDay
           from 
              FlexGridLayers_Table F
                 JOIN met_Data M
                    ON M.Year = 2000
                    AND F.JDay >= M.JDay
           where
              F.Segment = *var2* 
           group by
              F.JDay ) PQ 
         JOIN Met_Data M2 
            on M2.Year = 2000 
            AND PQ.MaxJDayPerFDay = M2.JDay

Now this does a pre-query by applying a MAX() JDay in the met_data ONCE and group by JDay so it will always return one record per F.JDay. So, now you have one query pre-qualified for your F.Segment = variable 2. If you had other columns you wanted from the "F" table, put them into this "PreQuery" (PQ alias) as needed.

Then, this result can immediately be joined back to the met_data table since the one day value is now explicitly known from the prequery. So, you can now get both TAIR and TDEW values at once rather than in two separate queries being applied for every record.

Hope this make sense, if not, let me know.

Works great! Compute time down from 32 seconds to 2 seconds...not bad! I didn't even think of using group by... — Amy, Nov 14 '12 at 03:15

Optimizing similar MySQL subqueries

1 Answers1