When organising an InfluxDB database, which of these two approaches would be most preferred?

Question

I am trying to decide how measurements should be organised in an InfluxDB the database (which I believe they call schema design and data layout) but I think this may be a more general database type question.

Let's say as a simple example that I am measuring two quantites, temperature and humidity (imaginative, I know!), in two locations, living room and outside.

InfluxDB has the syntax for inserting data points:

measurement, tag_key=tag_value field_key=field_value

and so there are two obvious (at least to me) options. Briefly, the first option would insert a data point like this:

INSERT temperature,location=outside value=15
INSERT humidity,location=outside value=50

whereas the second option would do it this way:

INSERT sensor_measurements,location=outside temperature=15,humidity=50

My questions are more high level:

Is there a preferred/accepted way to go about this?
Will I run into problems with either of these if I try to scale it up to more quantities/locations/data types?
Does either of the methods offer an advantage if I later on try to graph these things in Grafana, for example, or if I try to implement later some of the many InfluxQL functions?
Does anyone have any general advice about this to offer?

My own thoughts:

Option 1 seems to me to be more like what is implied by the InfluxDB description "measurement". Both temperature and humidity are separate quantities. But it seems a little clunky to just call it "value".

Option 2 appears to have the advantage that both the humidity and the temperature share exactly the same timestamp. This would come in useful, for example, if I wanted to import the data into some other software and do a correlation between the two quantites, and would mean I wouldn't have to do any interpolation or binning to get them to match up.

I am not sure if it is a bad idea with Option 2 to just have a general measurement called sensor_measurements, and will be hard to maintain later.

In detail:

Option 1

Have a separate "measurement" for each of temperature and humidity, use the location as a "tag", and just name the "field" as value:

At time t1, insert the data:

INSERT humidity,location=outside value=50
INSERT temperature,location=outside value=15
INSERT humidity,location=living_room value=65
INSERT temperature,location=living_room value=28

At time t2, insert some different data:

INSERT humidity,location=outside value=50
INSERT temperature,location=outside value=15
INSERT humidity,location=living_room value=65
INSERT temperature,location=living_room value=28

I can then get access to the living room temperature by querying the following:

> SELECT value FROM temperature WHERE location='living_room'

name: temperature
time                value
----                -----
1590416682017481091 28
1590416723963187592 29

I can also use the group by function to do something like this:

SELECT value FROM temperature GROUP BY "location"

Option 2

Have a combined "measurement" called sensor_measurements, for example, use a "tag" for location, and then have separate "fields" for each of temperature and humidity:

At time t1, insert the data:

INSERT sensor_measurements,location=outside temperature=15,humidity=50
INSERT sensor_measurements,location=living_room temperature=28,humidity=65

At time t2, insert some different data:

INSERT sensor_measurements,location=outside temperature=14,humidity=56
INSERT sensor_measurements,location=living_room temperature=29,humidity=63

I can now get access to the living room temperature by querying the following:

> SELECT temperature FROM sensor_measurements WHERE location='living_room'

name: sensor_measurements
time                temperature
----                -----------
1590416731530452068 28
1590416757055629103 29

I can now use the group by function to do something like this:

SELECT temperature FROM sensor_measurements GROUP BY "location"

score 5 · Accepted Answer · answered May 30 '20 at 17:30

5

I would use option 2 from offered options, because less records = less resources = better query response time (in theory). Generally, both approaches look good.

But I will use more generic 3rd option in real world. Single generic metrics measurement with tags metric,location and field value:

INSERT metrics,metric=temperature,location=outside value=15
INSERT metrics,metric=humidity,location=living_room value=50
INSERT metrics,metric=temperature,location=living_room value=28
INSERT metrics,metric=humidity,location=living_room value=65

That gives me opportunity to create single generic Grafana dashboard, where user will have option to select visualized metric/location via dashboard variable (generated directly from the InfluxDB, e.g. SHOW TAG VALUES WITH KEY = "metric"). Any new inserted metrics (e.g. `illuminance , pressure, wind-speed, wind-direction, ...) or location can be immediately visualized in this generic dashboard. Eventually, some metrics may have also additional tags. That is good and I will be able to use ad-hoc Grafana variable, so users will be able specify any number of key/value filters on the fly. Grafana doc.

answered May 30 '20 at 17:30

Jan Garaj

25,598
3
38
59

Thank you, this is interesting - I would not have considered doing it this way. The Grafana templating is something I will read a little more about - it looks like a very useful feature. I suppose the only thing I am hesitant about is that most of the examples in the InfluxDB user guides do it similar to Option 1, and they seem to recommend the individual metric names are used for the "measurement" names. I didn't want to be not following an existing standard if one existed. Thanks for your answer. – teeeeee May 30 '20 at 18:17
@teeeeee IMHO doc uses option 1 because telegraf (another project from InfluxData for data collection). It writes all metrics to the measurement, which is based on the plugin name by default, so cpu metrics will be in the cpu measurement, etc. – Jan Garaj May 31 '20 at 14:52
I see. So in that example, with Telegraf using a measurement name "cpu", would you leave it as that ? Or would you change it into the format given in your answer, i.e ```measurement = metrics, metric = cpu, etc``` ? If the answer is "leave it in the original Telegraf format", does that give rise to the same problem in terms of limiting your Grafana templating choices? – teeeeee May 31 '20 at 21:42
1

@teeeeee That is opinion based question. I would use telegraf default option in this case only because there is already many Grafana ready dashboards prepared for default telegraf behavior. – Jan Garaj Jun 01 '20 at 07:34
Sorry, I was waiting until the Bounty ended to see if there were anymore contributions. Thank you for your answer - I have now accepted it. – teeeeee Jun 05 '20 at 19:48
Hi @Jan Garaj , I have tested you recommendation, and I think I now understand a little better. You are saying that because you suggest to have a tag called "metric" with 2 possible values "metric=tempearture" and "metric=humidity" then we can have a variable for metric at the top of the dashboard, and the user can select temperature or humidity, right? Are you saying that the advantage is that this wouldn't be posisble with my Option 2, because temperature is a field then, and fields cannot be variables in Grafana? – teeeeee Jun 11 '20 at 09:11
@teeeeee Maybe you can do that. Try to play with `SHOW FIELD KEYS ...` query to see if you will be able to create dashboard variable. I have never tried that. – Jan Garaj Jun 11 '20 at 09:28
Okay. But can you think of a situation in which it is useful to have temperature and humidity be switchable as a variable? Typically if I make a graph panel and change whether temperature or humidity is diplayed through a variable, then all the axis labels, axis limits, and units (degrees vs %) would also need to change as well? – teeeeee Jun 11 '20 at 09:33

When organising an InfluxDB database, which of these two approaches would be most preferred?

Option 1

Option 2

1 Answers1