3

I have a query that goes through each hour of my book_records table and gets the number of books that are taken (value = 1) vs the number of books that are free (value = 0):

SELECT sr.time AS h,
       COUNT(CASE WHEN sr.value = 1 THEN 1 END) AS taken,
       COUNT(CASE WHEN sr.value = 0 THEN 1 END) AS free,
       sr.libID AS libID
FROM `book_records` AS sr
WHERE sr.time BETWEEN '2017-01-01 00:00:00' AND '2017-01-01 23:00:00' 
AND MINUTE(sr.time) = 0
AND libID = 0 
GROUP BY h, libID
ORDER BY h, libID

The main line to look at here is this:

WHERE sr.time BETWEEN '2017-01-01 00:00:00' AND '2017-01-01 23:00:00' 

This takes about 0.03s to query.

If I change it to this (5 days):

WHERE sr.time BETWEEN '2017-01-01 00:00:00' AND '2017-01-05 23:00:00' 

It takes about 0.15s.

10 days:

WHERE sr.time BETWEEN '2017-01-01 00:00:00' AND '2017-01-10 23:00:00'

It takes about 0.28s

But at 15 days:

WHERE sr.time BETWEEN '2017-01-01 00:00:00' AND '2017-01-15 23:00:00'

At the time of writing this, the query is still going. SHOW FULL PROCESSLIST tells me the query's state is set to 'Sleep' with the time going up.

So what gives? Is there something in my MySQL 5.7 configuration that might be causing this? The query speed is fine at 1, 5, and 10 days, but 5 days more (15) and the query gets put in a Sleep state? Why?

EDIT: SHOW FULL PROCESSLIST output:

| 2234 | phpmyadmin | localhost | NULL    | Sleep   |     4 |              | NULL
| 2235 | root       | localhost | library | Query   |     4 | Sending data | 
SELECT SQL_CALC_FOUND_ROWS sr.time AS h, 
COUNT(CASE WHEN sr.value = 1 THEN 1 END) AS taken, 
COUNT(CASE WHEN sr.value = 0 THEN 1 END) AS free, sr.libID AS libID FROM `book_records` AS sr
WHERE sr.time BETWEEN '2017-01-01 00:00:00' AND '2017-01-15 23:00:00' 
AND MINUTE(sr.time) = 0
AND libID = 0 
GROUP BY h, libID
ORDER BY h, libID LIMIT 1 |
Rick James
  • 135,179
  • 13
  • 127
  • 222
Aran Bins
  • 449
  • 10
  • 20
  • can you please show the output of `SHOW FULL PROCESSLIST` – Ankur Patel Jul 14 '18 at 11:10
  • Added to original post. If I kill 2235 it also kills 2234. Looks like the state is set to 'Sending data', my mistake. Still not sure how to fix this though. – Aran Bins Jul 14 '18 at 11:28
  • is 2235 the connection from which you are executing your query? – Ankur Patel Jul 14 '18 at 11:47
  • I executed the query from phpmyadmin. However the same result happens if I execute it directly from terminal in the mysql console. 2235 I believe is the process ID for the query. – Aran Bins Jul 14 '18 at 12:22
  • You want 23 1-minute time spans (00:00-00:01, 01:00-01:01, etc.) plus a 1-second span (just the one second of 23:00:00)?? – Rick James Jul 14 '18 at 21:59

1 Answers1

2

Let's take a look at your query.

SELECT sr.time AS h, 
       COUNT(CASE WHEN sr.value = 1 THEN 1 END) AS taken, 
       COUNT(CASE WHEN sr.value = 0 THEN 1 END) AS free, 
       sr.libID AS libID
  FROM `book_records` AS sr
WHERE sr.time BETWEEN '2017-01-01 00:00:00' AND '2017-01-01 23:00:00' 
  AND MINUTE(sr.time) = 0
  AND libID = 0 
GROUP BY h, libID
ORDER BY h, libID

First, you need a compound index on book_records (libID, time) for your query to be sargable. Unsargable queries need full table scans, which are slow.

Next, your selection criterion sr.time BETWEEN '2017-01-01 00:00:00' AND '2017-01-01 23:00:00' is a little strange. To get all the records for the day 1-July-2017. you need to use these criteria

       sr.time >= '2017-01-01 00:00:00'
   AND sr.time <  '2017-01-02 00:00:00'

Why? You want all records with times starting at midnight on 1-July, up until but not including midnight on 2-July.

Next, you have this selection criterion. MINUTE(sr.time) = 0. It removes many records from your result set, and keeps others. For example it removes a record with the time 10:03:00 and keeps a record with the time 10:00:59. That is probably not what you want. Eliminate that criterion; it makes the query non-sargable as well as selecting a strange set of rows.

Next, it looks like you are trying to present your results summarized by hour of the day. To do that you need GROUP BY HOUR(sr.time). That will give you the number of records by hour of the day, even when you include multiple days in your filter range.

Fourth, and less importantly, you can make this query even faster by creating an index on three columns (libID, time, value) rather than just two. That's called a covering index because it contains all the rows needed by your query. The query can be satisfied entirely from the index, so MySQL doesn't need to read both the index and the table. Create this index using

   CREATE INDEX book_records_id_time_val ON book_records (libID, time, value);

At the end of this your query looks like

SELECT HOUR(sr.time) AS h, 
       COUNT(CASE WHEN sr.value = 1 THEN 1 END) AS taken, 
       COUNT(CASE WHEN sr.value = 0 THEN 1 END) AS free, 
       sr.libID AS libID
  FROM `book_records` AS sr
 WHERE sr.time >= '2017-01-01 00:00:00'
   AND sr.time <  '2017-01-02 00:00:00'
   AND libID = 0 
 GROUP BY HOUR(sr.time), libID
 ORDER BY HOUR(sr.time), libID

Those changes should make your query more accurate and much much faster.

By the way to see a running query you must issue SHOW FULL PROCESSLIST from a different MySQL connection (another client program or another query window) than the one running the query. When a connection shows sleep it's not doing anything.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Thank you! However your new query isn't grouping by day as well, so it's adding up the values for every hour of every day. I guess I didn't specify I need it separate so I added a grouping of DAY(sr.time) as well and added the index, it works! I also still had to put MINUTE(sr.time) = 0 for it to only do right at the hour (00:00, 01:00, 02:00, etc.) Can you tell me why though, in my original query, using 15 days completely halts the query? It seems it's stuck on 'Sending Data' but if I use 10 days, it's fine? It's a difference of 5 days so I'm not sure what was wrong honestly. – Aran Bins Jul 14 '18 at 21:17
  • It's hard to know exactly why your query didn't finish running. Your `GROUP BY time` probably generated a very large intermediate temporary result set. Possibly it got too large for RAM with the longer time interval, and MySQL, in trying to handle on temporary tables on disk, just slowed way down. – O. Jones Jul 14 '18 at 21:22
  • @AranBins - The _may_ make a big difference in performance. What percentage of the records are `libID=0`? – Rick James Jul 14 '18 at 22:01
  • Currently all of them have libID = 0. I'm wondering if O. Jones is onto something about the RAM though. – Aran Bins Jul 14 '18 at 22:44