Count the number of rows between unix time stamps for each ID

Question

I'm trying to populate some data for a table. The query is being run on a table that contains ~50 million records. The query I'm currently using is below. It counts the number of rows that match the template id and are BETWEEN two unix timestamps:

SELECT COUNT(*) as count FROM `s_log` 
WHERE `time_sent` BETWEEN '1346904000' AND '1346993271' 
AND `template` = '1'

While the query above does work, performance is rather slow while looping through each template which at times can be in the hundreds. The time stamps are stored as int and are properly indexed. Just to test thing out, I tried running the query below, omitting the time_sent restriction:

SELECT COUNT(*) as count FROM `s_log` 
AND `template` = '1'

As expected, it runs very fast, but is obviously not restricting count results inside the correct time frame. How can I obtain a count for a specific template AND restrict that count BETWEEN two unix timestamps?

EXPLAIN:

SHOW CREATE TABLE s_log:

CREATE TABLE `s_log` (
 `id` int(255) NOT NULL AUTO_INCREMENT,
 `email` varchar(255) NOT NULL,
 `time_sent` int(25) NOT NULL,
 `template` int(55) NOT NULL,
 `key` varchar(255) NOT NULL,
 `node_id` int(55) NOT NULL,
 `status` varchar(55) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `email` (`email`),
 KEY `time_sent` (`time_sent`),
 KEY `template` (`template`),
 KEY `node_id` (`node_id`),
 KEY `key` (`key`),
 KEY `status` (`status`),
 KEY `timestamp` (`timestamp`)
) ENGINE=MyISAM AUTO_INCREMENT=2078966 DEFAULT CHARSET=latin1

Is `template` indexed as well? Also have you considered a composite index on both columns? — Fluffeh, Sep 07 '12 at 05:21
Why do write the query with the integer values under apostrophe (submitting as strings)? — Christian Ammer, Sep 07 '12 at 07:08

zerkms · Accepted Answer · 2012-09-07T06:26:55.273

1

The best index you may have in this case is composite one template + time_sent

CREATE INDEX template_time_sent ON s_log (template, time_sent)

PS: Also as long as all your columns in the query are integer DON'T enclose their values in quotes (in some cases it could lead to issues, at least with older mysql versions)

edited Sep 07 '12 at 06:26

answered Sep 07 '12 at 05:22

zerkms

249,484
69
436
539

I also thought the same thing, but there was no performance increase at all. – john Sep 07 '12 at 05:35
1

@john: as I said - you need to have single composite index. Not 2 separated. Delete `template` index, and create 1 index with 2 columns `template + time_sent` in this particular order. PS: added an example on how to create that index PPS: **IMPORTANT** after you create the index I proposed - delete `template` index – zerkms Sep 07 '12 at 06:23
Thanks for your example it got me where I needed to be. I didn't know I needed to delete the `template` index for the `template + time_sent` index to take effect. – john Sep 07 '12 at 07:28
@john: it's not actually "to take" effect, it's about redundancy. `template + any_other_column` already covers `template` column as a left most part, so any query that uses just `template` will be optimized using it as well. So single `template` index is just redundant and will just waste CPU, RAM and storage space. – zerkms Sep 07 '12 at 07:30

score 0 · Answer 2 · answered Sep 07 '12 at 06:23

0

First, you have to create an index that has both of your columns together (not seperately). Also check your table type, i think it would work great if your table is innoDB.

And lastly, use your WHERE clause in this fashion:

`WHEREtemplate= '1' ANDtime_sent` BETWEEN '1346904000' AND '1346993271'

What this does is first check if template is 1, if it is then it would check for the second condition else skip. This will definitely give you performance-edge

answered Sep 07 '12 at 06:23

Falcon

259
3
8

Order of predicates in `WHERE` doesn't matter. So your last paragraph is a bit makes no sense. – zerkms Sep 07 '12 at 06:30
MySQL is smart enough to fix this. The order in your WHERE does not speed up your query – Nin Sep 07 '12 at 06:30

score 0 · Answer 3 · edited May 23 '17 at 11:56

If you have to call the query for each template maybe it would be faster to get all the information with one query call by using GROUP BY:

SELECT template, COUNT(*) as count FROM `s_log` 
WHERE `time_sent` BETWEEN 1346904000 AND 1346993271;
GROUP BY template

It's just a guess that this would be faster and you also would have to redesign your code a bit.

You could also try to use InnoDB instead of MyISAM. InnoDB uses a clustered index which maybe performs better on large tables. From the MySQL site:

Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)

There are some questions on Stackoverflow which discuss the performance between InnoDB and MyISAM:

@zerkms: Good question, I always used `HAVING` with aggregate Function `COUNT(*)` like: `GROUP BY id HAVING COUNT(*) > 1`. But in this case it's not necessary. Thanks! — Christian Ammer, Sep 07 '12 at 06:49

Count the number of rows between unix time stamps for each ID

3 Answers3