0

So I've got a basic query here selecting between a date range. It seems to be taking a (relatively long time to process) I was wondering what I can do to improve the processing time. Here is the query.

SELECT h.id as hid,h.created_on,u.id as uid,u.fname,u.lname,u.email  
FROM hud h 
LEFT JOIN user_hud uh on h.id = uh.hid 
LEFT JOIN users u on u.id = uh.uid 
WHERE 
    h.created_on 
    BETWEEN DATE_SUB( CURDATE( ) ,INTERVAL 12 MONTH ) 
        AND DATE_SUB( CURDATE( ) ,INTERVAL 7 DAY)
GolezTrol
  • 114,394
  • 18
  • 182
  • 210
user1620152
  • 134
  • 2
  • 14
  • Breaks in queries are allowed. – GolezTrol Apr 06 '13 at 18:05
  • 1
    Do you have indixes on ```hud.id```, ```user_hud.hid```, ```users.id```, ```user_hud.uid``` and ```hud.created_on``` columns? Post the ```EXPLAIN``` of the query (http://dev.mysql.com/doc/refman/5.5/en/explain.html). How many rows are returned and what is the processing time? – dtech Apr 06 '13 at 18:07
  • @NathanBouscal How? The where has nothing to do with the joins. – GolezTrol Apr 06 '13 at 18:09
  • @NathanBouscal Not true, almost always the query optimizer is smart enough to figure this out itself. Without an EXPLAIN this is only speculation. – dtech Apr 06 '13 at 18:10
  • @user1620152 How long is 'relatively long'? And how much data is in those tables. What indexes do you have, what is the load on the server... – GolezTrol Apr 06 '13 at 18:10
  • Well there is only about 50 records in the database and it was taking approximately 35 seconds to process. It's running on localhost, I have an i7 processor with 16 gb memory. It was a lack of indexing across all the tables. I added the indexes and it dramatically decreased the time it took to process the query. – user1620152 Apr 06 '13 at 18:17
  • @NathanBouscal In this case it won't matter, as the selection ("WHERE clause") is only relevant to the ```hud``` table. Thus a good query optimizer will first do the selection before joining. See slide "Phases of Logical Query Optimization" of this: http://www.mpi-inf.mpg.de/departments/d5/teaching/ws07_08/queryoptimization/Lecture2.pdf – dtech Apr 06 '13 at 18:27
  • @dtech Thank you for the information, I'll do some reading :) –  Apr 06 '13 at 18:35
  • You have to make index. The processing time is independent of your system. – Hydroid Apr 06 '13 at 18:07

2 Answers2

1

Make indices on the hud.id, user_hud.hid, users.id, user_hud.uid and hud.created_on columns.

Without indices the whole table needs to be checked for the join. With indices you can use much more efficient methods (at the cost of update/delete performance loss and more disk space)

dtech
  • 13,741
  • 11
  • 48
  • 73
0

First, you need at least the following indices: (created_on) in hud, (hid) in user_hud. Then you can look at your query and realize that your condition is on the first table in your join, so you can move that condition into a subquery:

SELECT hids.*, u.id as uid, u.fname,
             u.lname, u.email
FROM (
    SELECT id as hid, created_on
    FROM hud,
       (SELECT DATE_SUB( CURDATE( ) ,INTERVAL 12 MONTH ) as year_ago,
                    DATE_SUB( CURDATE( ) ,INTERVAL 7 DAY) as week_ago) d
    WHERE created_on BETWEEN year_ago AND week_ago
    ) as hids
LEFT JOIN user_hud uh USING(hid)
LEFT JOIN users u on u.id = uh.uid; 
newtover
  • 31,286
  • 11
  • 84
  • 89
  • the condition is on ```hud```, so the optimizer should just get the records from hud which match the condition, so I think this is unnecessary. But an EXPLAIN is the only way to know for certain. Also: aren't there now 2 tables in the ```FROM```? I think you meant to select only from the subquery. – dtech Apr 06 '13 at 18:49
  • @dtech, unfortunatelly the optimizer is not samrt enough, but you are right, explain might shed the light. There are two tables in the subquery, but there should be no problem with this, the second one is executed only once. – newtover Apr 06 '13 at 19:03