3

I've been trying to work this one out for a while now, maybe my problem is coming up with the correct search query. I'm not sure.

Anyway, the problem I'm having is that I have a table of data that has a new row added every second (imagine the structure {id, timestamp(datetime), value}). I would like to do a single query for MySQL to go through the table and output only the first value of each minute.

I thought about doing this with multiple queries with LIMIT and datetime >= (beginning of minute) but with the volume of data I'm collecting that is a lot of queries so it would be nicer to produce the data in a single query.

Sample data:

id  datetime             value
1   2015-01-01 00:00:00  128
2   2015-01-01 00:00:01  127
3   2015-01-01 00:00:04  129
4   2015-01-01 00:00:05  127
...
67  2015-01-01 00:00:59  112
68  2015-01-01 00:01:12  108
69  2015-01-01 00:01:13  109

Where I would want the result to select the rows:

1   2015-01-01 00:00:00  128
68  2015-01-01 00:01:12  108

Any ideas?

Thanks!

EDIT: Forgot to add, the data, whilst every second, is not reliably on the first second of every minute - it may be :30 or :01 rather than :00 seconds past the minute

EDIT 2: A nice-to-have (definitely not required for answer) would be a query that is flexible to also take an arbitrary number of minutes (rather than one row each minute)

CallumA
  • 92
  • 1
  • 3
  • 7
  • 1
    Add your representative sample data and the expected result. Also post your best shot at solving the problem. – PM 77-1 Feb 13 '15 at 21:43
  • You say, that you can have less than 60 rows for a given minute. If a certain minute doesn't have any values at all, what would you like to see in the result set for this minute? – Vladimir Baranov Feb 14 '15 at 05:25

4 Answers4

2
SELECT t2.* FROM
( SELECT MIN(`datetime`) AS dt
  FROM tbl
 GROUP BY DATE_FORMAT(`datetime`,'%Y-%m-%d %H:%i')
) t1
JOIN tbl t2 ON t1.dt = t2.`datetime`

SQLFiddle

Or

SELECT * 
FROM tbl 
WHERE dt IN ( SELECT MIN(dt) AS dt
              FROM tbl
              GROUP BY DATE_FORMAT(dt,'%Y-%m-%d %H:%i'))

SQLFiddle

SELECT t1.* 
FROM tbl t1
LEFT JOIN (
  SELECT MIN(dt) AS dt 
  FROM tbl
  GROUP BY DATE_FORMAT(dt,'%Y-%m-%d %H:%i')
) t2 ON t1.dt = t2.dt
WHERE t2.dt IS NOT NULL

SQLFiddle

potashin
  • 44,205
  • 11
  • 83
  • 107
  • `DATE_FORMAT` seems like a pretty expensive operation for this purpose. – Jordan Running Feb 13 '15 at 21:54
  • On my (admittedly not particularly powerful) development MySQL server these queries both take in the region of 30 seconds to complete across 2,000,000 rows which is quite a long time really – CallumA Feb 14 '15 at 00:30
  • @CallumA : as @Jordan pointed out, `DATE_FORMAT` is expensive, but I don't know any other solution.You may try to use `LEFT JOIN`, as it considers to be the fastest anti-join solution in MySQL for the non-null columns. – potashin Feb 14 '15 at 01:07
  • @CallumA : Well, actually, it's impressive. but I don't know how to go any further with other possible improvements.) – potashin Feb 14 '15 at 02:18
  • @notulysses I just deleted my last comment as I was making so many test errors! Maybe I should get some sleep. Anyway, all three of your suggestions have similar execution times. I made a small improvement by switching the `DATE_FORMAT(...)` for `FLOOR(UNIX_TIMESTAMP(datetime) / 60)`. This seems to have improved the execution time by about 20%. Not sure what/if other improvements can be made, I'll keep looking. – CallumA Feb 14 '15 at 02:20
  • 1
    @notulysses `FLOOR(CAST(datetime as UNSIGNED) / 100)` is again even quicker, better than 50% reduction compared to `DATE_FORMAT(...)`! The only problem with it is that I use fractional seconds in my datetime (DATETIME(3)) and MySQL insists on rounding to nearest in the CAST(). This can be rectified by using `SUBTIME(datetime, '0.5')` before casting it but that increases the time by 30% after the 50% reduction -_- (still better than `UNIX_TIMESTAMP` though) with worse results (not accurate first of minute). This is enough for one night, will look again in morning. – CallumA Feb 14 '15 at 03:12
0

In MS SQL Server I would use CROSS APPLY, but as far as I know MySQL doesn't have it, so we can emulate it.

Make sure that you have an index on your datetime column.

Create a table of numbers, or in your case a table of minutes. If you have a table of numbers starting from 1 it is trivial to turn it into minutes in the necessary range.

SELECT
  tbl.ID
  ,tbl.`dt`
  ,tbl.value
FROM
  (
    SELECT 
      MinuteValue
      , (
        SELECT tbl.id
        FROM tbl
        WHERE tbl.`dt` >= Minutes.MinuteValue
        ORDER BY tbl.`dt`
        LIMIT 1
        ) AS ID
    FROM Minutes
  ) AS IDs
  INNER JOIN tbl ON tbl.ID = IDs.ID

For each minute find one row that has timestamp greater than the minute. I don't know how to return the full row, rather than one column in MySQL in the nested SELECT, so at first I'm making a temp table with two columns: Minute and id from the original table and then explicitly look up rows from original table knowing their IDs.

SQL Fiddle

I've created a table of Minutes in the SQL Fiddle with the necessary values to make example simple. In real life you would have a more generic table.

Here is SQL Fiddle that uses a table of numbers, just for illustration.

In any case, you do need to know in advance somehow the range of dates/numbers you are interested in.

It is trivial to make it work for any interval of minutes. If you need results every 5 minutes, just generate a table of minutes that has values not every 1 minute, but every 5 minutes. The main query would remain the same.

It may be more efficient, because here you don't join the big table to itself and you don't make calculations on the datetime column, so the server should be able to use the index on it.

The example that I made assumes that for each minute there is at least one row in the big table. If it is possible that there are some minutes that don't have any data at all you'd need to add extra check in the WHERE clause to make sure that the found row is still within that minute.

Community
  • 1
  • 1
Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
  • What did you eventually choose to do? How did you solve the problem? Was any of the answers helpful enough for you to accept an answer? – Vladimir Baranov Mar 12 '15 at 22:56
-1

select * from table where timestamp LIKE "%-%-% %:%:00" could work.

This is similar to this question: Stack Overflow Date SQL Query Question

Edit: This probably would work better:

`select , date_format(timestamp, '%Y-%m-%d %H:%i') as the_minute, count() from table group by the_minute order by the_minute

Similar to this question here: mysql select date format

Community
  • 1
  • 1
culthero
  • 21
  • 4
-1

i'm not really sure, but you could try this:

SELECT MIN(timestamp) FROM table WHERE YEAR(timestamp)=2015 GROUP BY DATE(timestamp), HOUR(timestamp), MINUTE(timestamp)
mech
  • 617
  • 6
  • 16