15

What is the difference between the count and the gauge metric types in DataDog? Or rather, when should I prefer one over the other? The definitions from their website don't help me much:

Count:

The COUNT metric submission type represents the total number of event occurrences in one time interval. A COUNT can be used to track the total number of connections made to a database or the total number of requests to an endpoint. This number of events can accumulate or decrease over time—it is not monotonically increasing.

Gauge:

The GAUGE metric submission type represents a snapshot of events in one time interval. This representative snapshot value is the last value submitted to the Agent during a time interval. A GAUGE can be used to take a measure of something reporting continuously—like the available disk space or memory used.

The count type seems to be somewhat related to the rate type, but for me it is unclear why or when I should use count instead of gauge. I mean in principle a measurement of "something" could always be presented as a gauge, couldn't it?

Bastian Venthur
  • 12,515
  • 5
  • 44
  • 78

1 Answers1

14

The datadog doc gives clear examples about their differences:

Count

Suppose you are submitting a COUNT metric, activeusers.basket_size, from a single host running the Datadog Agent. This host emits the following values in a flush time interval: [1,1,1,2,2,2,3,3].

The Agent adds all the values received in one time interval and submits the total number, in this case 15, as the COUNT metric’s value.

Gauge

Suppose you are submitting a GAUGE metric, temperature, from a single host running the Datadog Agent. This host emits the following values in a flush time interval: [71,71,71,71,71,71,71.5].

The Agent submits the last reported number, in this case 71.5, as the GAUGE metric’s value.

Essentially, in a flush time interval, usually 10s, Count accumulates all values and submit the sum value, while Gauge only keeps the latest one because it's a snapshot, and it also consumes less resource.

One good example for Count is that we want to know how many 404 happens in a certain time period, in this scenario, the sum matters. One good example for Gauge is that we want to check the memory usage of a server, 10s, the default flush interval, is sufficient to get one particular snapshot about it.

springuper
  • 508
  • 4
  • 11