SQL Performance of grouping by DATE(TIMESTAMP) vs separate columns for DATE and TIME

Question

I'm facing a problem of displaying data from MySQL database. I have a table with all user requestes in format:

| TIMESTAMP Time / +INDEX | Some other params |

I want to show this data on my website as a table with number of requests in each day.

The query is quite simple:

SELECT DATE(Time) as D, COUNT(*) as S FROM Stats GROUP BY D ORDER BY D DESC

But when looking into EXPLAIN this drives me mad:

Using index; **Using temporary; Using filesort**

From MySQL docs it says that it creates temporary table for this query on hard drive.

How fast it would be with 1.000.000 records? And how fast with 100.000.000?

Is there any way to put INDEX on result of function? Maybe I should create separate columns for DATE and TIME and than group by DATE column? What are other good ways of dealing with such problem? Caching? Another DB engine?

It depends. And, first of all, "using filesort" does not mean "disk" operation. It may be sort in memory. Indeed - you can create two separate columns (date and time or date and datetime, so keep redundant data to avoid datetime reconstruction) - but real improvement will strictly depend from resulted index cardinality. Also, how often would be that checks? Are there any other queries on that table? — Alma Do, Jun 24 '14 at 13:43
How often? My first idea wa to do this 'on demand' so every time user visits 'Stats' page. There are only 2 queries: INSERT (on each request) and SELECT to create this data. — peku33, Jun 24 '14 at 13:49

O. Jones · Accepted Answer · 2014-06-24T14:30:46.543

If you have an index on your Time column this operation is going to perform tolerably well. I'm guessing you do have that index, because your EXPLAIN output says it's using an index.

Why does this work well? Because MySQL can access this index in order -- it can scan the index -- to satisfy your query.

Don't be confused by Using temporary; Using filesort. This simply means MySQL needs to create and return a virtual table with a row for each day. That's pretty small and almost surely fits in memory. filesort doesn't necessarily mean the file has spilled to a temp file on disk; it just means MySQL has to sort the virtual table. It has to sort it to get the last day first.

By the way, if you can restrict the date range of the query you'll get predictable performance on this query even when your application has been in use for years. Try something this:

SELECT DATE(Time) as D, COUNT(*) as S 
  FROM Stats
 WHERE Time >= CURDATE() - INTERVAL 30 DAY 
  GROUP BY D ORDER BY D DESC

Yes, I do have index on Time. Of course I am going to limit results to the last 7 days, but with huge load it could anyway have lots of row to process. — peku33, Jun 24 '14 at 14:46
Well, with seven rows, one for each day, in filesort, that won't take long. MySQL still needs to scan the index to get the results, but that is decently fast too. — O. Jones, Jun 24 '14 at 17:18

score 0 · Answer 2 · edited May 23 '17 at 12:05

First: a GROUP BY means sorting and it is an expensive operation. The data in the index is sorted but even in this case the ddbb needs to groups dates. So I feel that indexing by DATE may help as it will improve the speed of the query at the cost of refreshing another index at every insert. Please test it, i am not 100% sure.

Other alternatives are:

Using a partitioned table by month.
Using a materialized views
Updating a counter with every visit.
Precalculating and storing yesterday's data. Just refresh your daily visits with a WHERE DAY(timestamp) = TODAY. This way the serer will have to sort a smaller amount of data.

Dependes on how often do user visit your page and when you do need this data. Do not optimize prematuraly if you do not need it.

This is MySQL. No materialized views. – O. Jones Jun 24 '14 at 14:17 — O. Jones, Jun 24 '14 at 14:17

SQL Performance of grouping by DATE(TIMESTAMP) vs separate columns for DATE and TIME

2 Answers2