5

I have some grafana dashboards with graphs that sometimes show "No data points". I know there's data, because other times I see graphs and other graphs on the same page display other results from the same measurements. Also, I can query the data directly in influxdb.

Anecdotally it appears that longer time periods are more likely to result in this failure than shorter (i.e., 30 days sometimes fails, 1 day rarely fails). This is every few seconds data, like system stats.

I suspect (with inadequate data) that influxdb is sometimes taking too long to respond and grafana times out, or else that influxdb outright fails the query due to too much data relative to resources available. OTOH, directly querying influxdb works fine (see below), though I'm throwing only one query at a time at it. If I query while the dashboard updates, the query takes much longer, as if I'm waiting for a worker thread to handle my query.

But before I just start growing hardware, I'd like to have more than just a hunch. I don't have that much data. Yet the influx and grafana logs aren't showing me anything terribly interesting (such as OOM, timeouts, or query failures).

Any suggestions?

BTW, a sample query in grafana is this:

SELECT percentile("usage_system", 95) FROM "cpu"
WHERE "host" =~ /^$host$/ AND $timeFilter
GROUP BY time($__interval), "host"

If I query directly against influxdb, the query results are returned almost immediately, whereas in grafana I wait a good long bit with a spinner displaying. (If I query at the same time that I update a dashboard, the query takes a bit, consistent with waiting for a worker thread to handle my query.)

select percentile(usage_system, 95) from cpu
WHERE host = 'seine3'
AND time >= 1519216559000000000 AND time <= 1521808559000000000
GROUP BY time(1h), host

or

select percentile(usage_system, 95) from cpu
WHERE host = 'seine3'
AND time >= '2018-02-23T00:00:00Z' AND time <= '2018-03-23T00:30:00Z'
GROUP BY time(1h), host
jma
  • 3,580
  • 6
  • 40
  • 60

0 Answers0