2

I am facing, in these days, the problem of storing some Time Series Data.

This data is taken from an industrial machine: for each job (about 3 per hour, 24/24h), a software records:

  • oil pressure;
  • oil temperature;
  • some vibrational data.

Vibrational data is taken at very high frequency (> 10 kHz), and leads to very massive memory requirements. This issue made my company evaluate some possibilities to efficiently store this data.

Inserts will be not very frequent (maybe 1 or 2 times per day, when the machine is not operative). Reads will be potentially very frequent (another software will retrieve data for plotting and analyzing purposes).

For now, a single node will be used for storing data, so I don't want (for now) to take into account partitions and parallelization matters.

What solution should I prefer? A relational DBMS (such as MySQL or PostgreSQL), or a common-purpose NoSQL DB (e.g. a column-oriented one - consider that all Time Series will be univariate -, like Cassandra, or a document-oriented one, like MongoDB)?

Beyond my particular use case, when generally to prefer RDMBS over NoSQL for Time Series storing? When to prefer NoSQL over RDBMS?

LucaF
  • 73
  • 1
  • 4
  • Possible duplicate of [Difference between time-series database and relational database](https://stackoverflow.com/questions/35428606/difference-between-time-series-database-and-relational-database) – philipxy Jul 06 '19 at 21:47

1 Answers1

9

tl;dr:

Typically for time series, I would use a time-series database like InfluxDb. Some NoSQL products, like MongoDB, actually combines these features.


Traditionally, NoSQL was used for unstructured high volumes like logging results, website search data etc. However, with professionals becoming more comfortable with document storage and relational capabilities improved over time, NoSQL is often favoured over traditional relational databases.

When to use a Relational database?

Since the modelling of NoSQL is conceptually somewhat different than modelling a relational database, a relational database is typically preferred in application and data migration scenario's where there is already a big relational database present.

Relational databases uses normalisation to optimise storage. This is an artefact from times in which storage was very expensive. When modelling NoSQL, it is common to optimise for read and write speeds and business clarity.

If not dealing with a legacy system, NoSQL like solutions is a modern substitute for RDBMS.

Stefan
  • 17,448
  • 11
  • 60
  • 79
  • 1
    Thanks for your complete answer. To answer your question: my final aim is to apply some technique of predictive maintenance/anomaly detection, so data retention will be probably very small (max. a couple of months for testing purposes, maybe). For now, I am not asked to face data aging (e.g. aggregate old data, reducing its frequency). Maybe I am supposed to store anomalous data, on some other DB, to keep track of them (and apply some reinforcement learning). – LucaF Oct 29 '18 at 14:25