1

I have a database of time series data (temperature/weather readings) in an SQLite database. Sampling one data point per minute will give me half a million points per year, and when plotting over long timespans, it would be better to subsample the points, to spread them evenly.

If I want to query my database and receive max 1000 points, how can I do that? The relevant columns are (id INTEGER PRIMARY KEY, value REAL, timestamp INTEGER).

If there are less than 1000 points, I want to get all. If there are more, I want to get 1000 rows that have (approximately) even durations between the two.

This question does something similar, but does not seem to work in SQLite, only PostgreSQL.

Supernormal
  • 962
  • 8
  • 15

1 Answers1

1

First check how many rows you have:

SELECT COUNT(*) FROM data

if there is less than 1000 do a simple SELECT command. If not do:

SELECT * FROM data WHERE (ROWID-1) % (SELECT COUNT(*)/1000 FROM data) = 0

it will return 1000 rows so that they are evenly spaced. If you want the last row to be included too add OR ROWID = (SELECT COUNT(*) FROM data) to the end of above command.

Felix.leg
  • 622
  • 1
  • 4
  • 13
  • That should solve the problem under the assumption that the sampling is done at regular intervals. Even better would be something that looks at the time stamps and makes sure the time intervals are evenly spread. But maybe that would be a very expensive operation. – Supernormal May 28 '20 at 11:45
  • @Supernormal yes, at the beginning of your post you say, that you weather station takes samples about each minute, so I wrote my answer under assumption that each row has got (almost) equal distance in time to the each other. If I had to do calculations of how much they differ in time, then it would require to make more than one query to the database, which may hit performance down... – Felix.leg May 28 '20 at 12:13