4

I'm running ganglia in EC2 and reporting works well. I'm running gmetad to monitor a database cluster from an admin reporting instance that is a centralized dashboard for all our systems. I don't want this instance to be included in the monitoring. In the admin gmond.conf, I've set:

mute = yes

But this only makes the web front-end show the host as dead. In the admin gmetad.conf, I have the data_source set to:

data_source "cluster" ec2-X-X-X-X.compute-1.amazonaws.com

(with ec2-X-X-X-X.compute-1.amazonaws.com being the ec2 hostname of the admin instance)

I thought setting mute=yes would remove it from the reports and gstat, but both still show the admin host (localhost) as dead.

Is there a way to do this?

Dave Stern
  • 511
  • 2
  • 7
  • 13

1 Answers1

6

You need to set the host_dmax attribute to a value other than 0 in your gmond.conf file.

For me the mute Ganglia host shows up initially (after restart) but then disappears after the time I've set for host_dmax. It's unclear why the host shows up in the first place after a restart even though mute is set to yes.

The cleanup_threshold attribute may also affect the time it takes for the host to disappear.

This is covered in the Ganglia wiki:

The host_dmax value is an integer with units in seconds. When set to zero (0), gmond will never delete a host from its list even when a remote host has stopped reporting. If host_dmax is set to a positive number then gmond will flush a host after it has not heard from it for host_dmax seconds. By the way, dmax means "delete max".

The cleanup_threshold is the minimum amount of time before gmond will cleanup any hosts or metrics where tn > dmax a.k.a. expired data.

Community
  • 1
  • 1
mmajis
  • 545
  • 4
  • 10
  • Thanks very much. What would you recommend as settings for these values? I assume the `host_dmax` should be fairly high so that I can be aware of legitimate failures. – Dave Stern Feb 07 '13 at 20:17
  • @DaveStern That depends on your scenario. Set up an alerting facility (Nagios is common to use) to quickly become aware of failures. If a host disappears from the web interface you will still have its history data collected by gmetad in its RRD database. You can substitute the `h` parameter in ganglia-web graph URLs with the hostname of the desired node to graph its data and see what happened. – mmajis Feb 08 '13 at 15:31