0

This is the first query I was working with. The query took way too long to process. This will be processing millions of rows in the table.

select date(datetime), count(*) from LOG l1
where ((datetime >= str_to_date('2013-01-01 00:00:00','%Y-%m-%d %H:%i:%s'))  
and (datetime < str_to_date('2013-01-30 00:00:00','%Y-%m-%d %H:%i:%s'))
and code_subcode in ('8008118', '8008218', '8008318', '8008418'))
and (
select count(*) from log l2
where l2.session_id = l1.session_id and l2.remote_ip = l1.remote_ip and code_subcode
in('8008119', '8008219', '8008319', '8008419')
and TIMEDIFF(l2.datetime, l1.datetime) < 1)
group by date(datetime) order by date(datetime);

After some reading I found that joins are supposed to be faster than nested-selects so I converted it to this...

SELECT date(l1.datetime), count(*) from LOG l1
INNER JOIN LOG l2 ON l2.SESSION_ID=l1.SESSION_ID
where ((l1.datetime >= str_to_date('2013-01-01 00:00:00','%Y-%m-%d %H:%i:%s'))
and (l1.datetime < str_to_date('2013-01-30 00:00:00','%Y-%m-%d %H:%i:%s'))
and l1.code_subcode in ('8008118', '8008218', '8008318', '8008418')
and l2.code_subcode in('8008119', '8008219', '8008319', '8008419')
and date(l1.datetime) = date(l2.datetime)
and TIMEDIFF(l2.datetime, l1.datetime) < 1)
GROUP BY date(l1.datetime)
ORDER BY DATE(l1.datetime);

Please help, thanks.


Thank you all for your help. It turns out that the reason it is taking so long to execute is because the program is silently failing on the server. When executing the query on my local environment it processes 800,000 lines in 48 seconds. If I have time I'll look into farther optimization, for now I have too much crap to get done.

Once again, thanks!

  • You're using a co-related subquery, causing the subquery to be executed for EVERY row in the parent query. for a million row table, you'll end up running two million queries... As well, the whole str_to_date business is pointless. mysql will happily accept your plain-text date-strings as is and treat them as real datetimes. – Marc B Oct 01 '13 at 19:30
  • It might be useful to give an overview of what this query is supposed to **do**. – Brendan Long Oct 01 '13 at 19:33
  • I believe you would also get a performance boost if you changed the COUNT(*) to only count unique IDs, assuming that the LOG table has a unique ID for each row. – Otis Oct 01 '13 at 19:35
  • @MarcB just a nitpick, but it's 'correlated' not 'co-related' – Hart CO Oct 01 '13 at 19:44

3 Answers3

1

The problem is you're grouping by a computed value (DATE(l1.datetime)), meaning that SQL will have to perform a table scan of the log table and also create temporary working structures to hold the data while it is being aggregated. Presumably this table is huge so that is really going to hurt you.

If you frequently need to run query of this sort, consider adding a computed column for DATE(datetime) in the Log table, then create an index on it.

For information on computed column indexes, look here

Conrad Frix
  • 51,984
  • 12
  • 96
  • 155
John Wu
  • 50,556
  • 8
  • 44
  • 80
1

All of the suggestions you've received so far have merit ...

But don't "guess". Execute an "EXPLAIN" to see exactly what mySQL thinks it should be doing ... so you can figure out how to optimize your query and do it better.

Here are some references:

paulsm4
  • 114,292
  • 17
  • 138
  • 190
0

Please try to run this query and tell how much time it took. Also, I haven't tested the query & just wrote it based on your join query. Please correct syntax errors,if any.

SELECT date(a.datetime), count(*) from 
(
SELECT datetime,SESSION_ID from LOG where 
(datetime >= str_to_date('2013-01-01 00:00:00','%Y-%m-%d %H:%i:%s')) and 
(datetime < str_to_date('2013-01-30 00:00:00','%Y-%m-%d %H:%i:%s') and
(code_subcode in ('8008118', '8008218', '8008318', '8008418'))
) as a, 
(
SELECT datetime,SESSION_ID from LOG where (code_subcode in('8008119', '8008219', '8008319', '8008419'))
) as b where b.SESSION_ID=a.SESSION_ID and 
date(a.datetime) = date(b.datetime) and TIMEDIFF(b.datetime, a.datetime) < 1)
GROUP BY DATE(a.datetime)
ORDER BY DATE(a.datetime);

Also please use indexes & try to store date & time in separate columns to minimize translation cost.

Use following code to profile

set profiling=1;
//run your query
show profile;

Tell us the output. Thank you

Amit Kumar
  • 2,685
  • 2
  • 37
  • 72