50

I am using the consul exporter to ingest the health and status of my services into Prometheus. I'd like to fire alerts when the status of services and nodes in Consul is critical and then use tags extracted from Consul when routing those alerts.

I understand from this discussion that service tags are likely to be exported as a separate metric, but I'm not sure how to join one series with another so I can leverage the tags with the health status.

For example, the following query:

max(consul_health_service_status{status="critical"}) by (service_name, status,node) == 1

could return:

{node="app-server-02",service_name="app-server",status="critical"} 1

but I'd also like 'env' from this series:

consul_service_tags{node="app-server-02",service_name="app-server",env="prod"} 1

to get joined along node and service_name to pass the following to the Alertmanager as a single series:

{node="app-server-02",service_name="app-server",status="critical",env="prod"} 1

I could then match 'env' in my routing.

Is there any way to do this? It doesn't look to me like any operations or functions give me the ability to group or join like this. As far as I can see, the tags would already need to be labels on the consul_health_service_status metric.

Rob Best
  • 501
  • 1
  • 4
  • 3

3 Answers3

66

You can use the argument list of group_left to include extra labels from the right operand (parentheses and indents for clarity):

(
  max(consul_health_service_status{status="critical"}) 
  by (service_name,status,node) == 1
)
   + on(service_name,node) group_left(env)
(
   0 * consul_service_tags
)

The important part here is the operation + on(service_name,node) group_left(env):

  • the + is "abused" as a join operator (fine since 0 * consul_service_tags always has the value 0)
  • group_left(env) is the modifier that includes the extra label env from the right (consul_service_tags)
user2361830
  • 974
  • 7
  • 7
  • 1
    one improvement could be to force the metric u r joining to not affect urs by doing something like this (addition of a metric forced to be 0): `+ on(service_name,node) group_left(env) (0 *consul_service_tags)` – Elad Amit Feb 27 '19 at 22:22
  • @EladAmit - yes, perfect! thanks. I'v changed the post to reflect your improvement. – user2361830 Mar 08 '19 at 09:33
  • 2
    `consul_service_tags` is always 1, so instead of the `* 0` and `+`, a simpler way of doing this is more like `(max(consul_health_service_status{status="critical"}) by (service_name,status,node) == 1) * on(service_name,node) group_left(env) consul_service_tags` – gwk Aug 21 '19 at 12:05
  • 4
    [Good PromQL primer](https://www.section.io/blog/prometheus-querying/) if this is as impenetrable to you as it was to me a few hours ago :joy: – mgalgs Jan 18 '20 at 05:18
2

The answer in this question is accurate. I want to also share a clearer explanation on joining two metrics preserving SAME Labels (might not be directly answering the question). In these metrics following label is there.

  • name (eg: aaa, bbb, ccc)

I have a metric name metric_a, and if this returns no data for some of the labels, I wish to fetch data from metric_b. i.e:

  • metric_a has values for {name="aaa"} and {name="bbb"}
  • metric_b has values for {name="ccc"}

I want the output to be for all three name labels. The solution is to use or in Prometheus.

sum by (name) (increase(metric_a[1w]))
or
sum by (name) (increase(metric_b[1w]))

The result of this will have values for {name="aaa"}, {name="bbb"} and {name="ccc"}.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Keet Sugathadasa
  • 11,595
  • 6
  • 65
  • 80
2

It is a good practice in Prometheus ecosystem to expose additional labels, which can be joined to multiple metrics, via a separate info-like metric as explained in this article. For example, consul_service_tags metric exposes a set of tags, which can be joined to metrics via (service_name, node) labels.

The join is usually performed via on() and group_left() modifiers applied to * operation. The * doesn't modify values for time series on the left side because info-like metrics usually have constant 1 values. The on() modifier is used for limiting the labels used for finding matching time series on the left and the right side of *. The group_left() modifier is used for adding additional labels from time series on the right side of *. See these docs for details.

For example, the following PromQL query adds env label from consul_service_tags metric to consul_health_service_status metric with the same set of (service_name, node) labels:

consul_health_service_status
  * on(service_name, node) group_left(env)
consul_service_tags

Additional label filters can be added to consul_health_service_status if needed. For example, the following query returns only time series with status="critical" label:

consul_health_service_status{status="critical"}
  * on(service_name, node) group_left(env)
consul_service_tags
valyala
  • 11,669
  • 1
  • 59
  • 62