4

I have a timeseries of datetime, double columns stored in mySQL and would like to sample the timeseries every minute (i.e. pull out the last value at one minute intervals). Is there an efficient way of doing this in one select statement?

The brute force way would involve either selecting the whole series and doing the sampling on the client side or sending one select for each point (e.g. select * from data where timestamp<xxxxxxxxx order by timestamp desc limit 1).

Andriy M
  • 76,112
  • 17
  • 94
  • 154
DD.
  • 21,498
  • 52
  • 157
  • 246

2 Answers2

8

You could convert your timestamps to UNIX timestamps, group by unix_timestamp DIV 60 and pull the maximum timestamps from each group. Afterwards join the obtained list back to the original table to pull the data for the obtained timestamps.

Basically it might look something like this:

SELECT
  t.*  /* you might want to be more specific here */
FROM atable t
  INNER JOIN (
    SELECT
      MAX(timestamp) AS timestamp
    FROM atable
    GROUP BY UNIX_TIMESTAMP(timestamp) DIV 60
  ) m ON t.timestamp = m.timestamp
Andriy M
  • 76,112
  • 17
  • 94
  • 154
  • `GROUP BY UNIX_TIMESTAMP(timestamp) DIV 60`... just awesome! I wonder, if this can make use of an index on timestamp? Or would you have any other recommendations for optimizations, so this will work on extremely large datasets, other than aggregating statistics in a separate table? – Domi Mar 29 '15 at 19:46
  • 1
    @Domi: Thanks. I doubt very much this would use an index on `timestamp`. As I'm not really a MySQL expert, I'm not sure what to suggest as a good alternative. This is basically a "greatest N per group" problem, and on large datasets I personally might try an established, even though very MySQL-specific, method of solving this kind that involves variables, something along the lines of [this answer](http://stackoverflow.com/a/12114021) (something, too, that I suspect I probably didn't know at the time of answering this question, if I'm perfectly honest). – Andriy M Mar 29 '15 at 20:17
4

You can use DATE_FORMAT to get just the parts of the datetime that you want. You want to get the datetime down to the minute, and then for each group with that "rounded-off" time, get the row with the maximum time.