3

I'm trying to monitor local disk usage (percentage) on Dataproc 2.0 using cloud metrics. This would be useful for monitoring situations where Spark temporary files fill up disk.

By default Dataproc seems to send only local disk performance metrics, CPU etc.. metrics and cluster level HDFS metrics but not local disk usage.

There seems to be a stackdriver agent installed on the Dataproc image but it is not running so apparently Dataproc uses a different way of collecting metrics. I checked that df plugin is enabled in /etc/stackdriver/collectd.conf. However, starting the agent fails:

Jul 16 03:01:57 metrics-test-m systemd[1]: Starting LSB: start and stop Stackdriver Agent...
Jul 16 03:01:57 metrics-test-m stackdriver-agent[3829]: Starting Stackdriver metrics collection agent: stackdriver-agentThe instance has neither the application default credentials file nor the correct monitoring scopes; Exiting. ... failed!
Jul 16 03:01:57 metrics-test-m stackdriver-agent[3829]: not starting, configuration/credentials error. ... failed!
Jul 16 03:01:57 metrics-test-m stackdriver-agent[3829]:  (warning).
Jul 16 03:01:57 metrics-test-m systemd[1]: Started LSB: start and stop Stackdriver Agent.

Is it possible to somehow monitor local disk usage in Dataproc and push the metrics to Google Cloud Metrics?

Dagang
  • 24,586
  • 26
  • 88
  • 133
ollik1
  • 4,460
  • 1
  • 9
  • 20

1 Answers1

3

Google Cloud Monitoring Agent is installed on Dataproc cluster VMs, but disabled by default.

You can enable it by adding --properties dataproc:dataproc.monitoring.stackdriver.enable=true when creating the cluster. The agent collects guest OS metrics including memory and disk usage, so you can view them in Cloud Metrics. See the property in this doc.

BTW, CPU usage is collected by GCE from the VM host without the agent. But for memory and local disk usage, VM host doesn't have knowledge about them, they have to be collected from inside the guest OS, hence it depends on the agent. When you enable the agent, there will be two CPU usage metrics with different types, one (compute) is from the VM host perspective, the other (agent) is from the guest OS perspective.

Pricing: these metrics are not free of charge, check Cloud Monitoring pricing for the pricing.

Dagang
  • 24,586
  • 26
  • 88
  • 133