64

In multiple projects we have to store, aggregate, evaluate simple measurement values. One row typcially consists of a time stamp, a value and some attributes to the value. In some applications we would like to store 1000 values per second and more. These values must not only be inserted but also deleted at the same rate, since the lifetime of a value is restricted to a year or so (in different aggregation steps, we do not store 1000/s for the whole year).

Until now, we have developped different solutions. One based on Firebird, one on Oracle and one on some self-made storage mechanism. But none of these are very satisfying solutions.

Both RDBMS solutions cannot handle the desired data flow. Besides that, the applications that deliver the values (e.g. device drivers) cannot be easily attached to databases, the insert statements are cumbersome. And finally, while having an SQL interface to the data is strongly desired, typical evaluations are hard to formulate in SQL and slow in the execution. E.g. find the maximum value with time stamp per 15 minutes for all measurements during the last month.

The self-made solution can handle the insertion rate and has a client-friendly API to do it, but it has nothing like a query language and cannot be used by other applications via some standard interface e.g. for reporting.

The best solution in my dreams would be a database system that:

  • has an API for very fast insertion
  • is able to remove/truncate the values in the same speed
  • provides a standard SQL interface with specific support for typical time series data

Do you know some database that comes near those requirements or would you approach the problem in a different way?

tshepang
  • 12,111
  • 21
  • 91
  • 136
Kit Fisto
  • 4,385
  • 5
  • 26
  • 43
  • For fast and convenient (in-memory) processing you could try [`pandas`](http://pandas.sourceforge.net/), for fast IO -- [`PyTables`](http://www.pytables.org/moin). – jfs Jan 15 '12 at 11:24
  • Also you could try Starcounter. It can handle a few million SQL requests per second on a single Windows Server 2008. It is still in closed beta, but it has been running in production on very large installations for some time now. An email should suffice to get a copy. – Jack Wester Jan 20 '12 at 03:22
  • i have a similar, if not identical question that was closed as off-topic as well, so i moved it here: http://softwarerecs.stackexchange.com/questions/18302/database-for-large-volume-of-metrics-key-value-timestamp – aep Mar 28 '15 at 19:42

6 Answers6

43

Most other answers seem to mention SQL based databases. NoSQL based databases are far superior at this kind of thing.

Some Open source time-series databases:

Cloud-based:

Gaurav Shah
  • 5,223
  • 7
  • 43
  • 71
Joakim
  • 11,468
  • 9
  • 44
  • 50
  • 1
    Have you looked at column-oriented databases? – hellomichibye Sep 26 '14 at 12:48
  • 1
    Wikipedia offers an overview: https://en.wikipedia.org/wiki/Time_series_database#Example_TSDB_Systems – koppor Apr 16 '15 at 15:13
  • 3
    Erol offers a complete list: http://www.erol.si/2015/01/the-complete-list-of-all-timeseries-databases-for-your-iot-project/ – koppor Apr 17 '15 at 06:52
  • Since OpenTSDB is on the list, I would mention Axibase Time-Series Database as it supports opentsdb protocol and tcollector itself as of May 2015: http://axibase.com/products/axibase-time-series-database/writing-data/tcollector/ – Sergei Rodionov Jun 01 '15 at 17:39
  • That's a great list. Throwing Blueflood (cassandra base) into the mix: https://github.com/rackerlabs/blueflood – VinnyQ77 Apr 14 '16 at 17:15
  • ElasticSearch is scalable, open source, and can be used as a TSDB. CERN did an analysis and decided it to be the winner for them ( http://cds.cern.ch/record/2011172/files/LHCb-TALK-2015-060.pdf ) – Quinton Pike Sep 12 '16 at 12:48
  • Take a look also at [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) time series database - it provides high performance at low resource usage. – valyala Sep 07 '21 at 21:09
8

influxdb :: An open-source distributed time series database with no external dependencies.

A.N.
  • 97
  • 1
  • 2
  • 6
    It's not distributed unless you pay $$$ ( https://docs.influxdata.com/influxdb/v1.0/high_availability/clusters/ ) – Quinton Pike Sep 12 '16 at 12:39
2

Consider IBM Informix Dynamic Server with the TimeSeries DataBlade.

That is, however, an extreme data rate that you are working with. (Not quite up to sub-atomic physics at CERN, but headed in that general direction.)


Fair disclosure: I work for IBM on the Informix DBMS, though not on the TimeSeries DataBlade per se.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
1

You can try HDF5 for time series data. It is extremely fast for such applications.

shoonya
  • 292
  • 1
  • 10
  • 1
    HDF5 seems to be a storage format. Could you elaborate a bit more on the usage in the context of time series data? – koppor Jun 16 '15 at 17:33
  • A python library based on HDF5 is availble at https://github.com/afiedler/tstables – koppor Jul 19 '15 at 01:10
1

SQL Server StreamInsight

Microsoft StreamInsight BOL

Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
  • Thank you for the hint. StreamInsight sounds interesting and have to investigate it. Best of all, after what I understand, it is included in SQL Server? My concern: Storage is still in SQL Server. Will it be able to handle the requested data flow? StreamInsight docs say it can handle up to 100000 events/s "in peak time" but that must refer to some kind of buffering. What is the steady state rate? – Kit Fisto Jan 11 '12 at 09:06
  • storage speeds depend on your hardware; spend enough and you can get 17.2GBytes per second!! 1.2GB a second is attainable with a good 12 - 18 drive RAID array – Mitch Wheat Jan 11 '12 at 09:16
0

As Jonathan Leffler said, you should try Informix Timeseries feature. It is included in all editions of Informix at no additional charge. You can take a look at the TimeSeries functions it supports:

IBM Informix Time series SQL routines

You can access the data through sql functions or virtual view interfaces, you can even insert into the view.

GeorgeStone
  • 331
  • 3
  • 11