I'm trying to monitor local disk usage (percentage) on Dataproc 2.0 using cloud metrics. This would be useful for monitoring situations where Spark temporary files fill up disk.
By default Dataproc seems to send only local disk performance metrics, CPU etc.. metrics and cluster level HDFS metrics but not local disk usage.
There seems to be a stackdriver agent installed on the Dataproc image but it is not running so apparently Dataproc uses a different way of collecting metrics. I checked that df plugin is enabled in /etc/stackdriver/collectd.conf
. However, starting the agent fails:
Jul 16 03:01:57 metrics-test-m systemd[1]: Starting LSB: start and stop Stackdriver Agent...
Jul 16 03:01:57 metrics-test-m stackdriver-agent[3829]: Starting Stackdriver metrics collection agent: stackdriver-agentThe instance has neither the application default credentials file nor the correct monitoring scopes; Exiting. ... failed!
Jul 16 03:01:57 metrics-test-m stackdriver-agent[3829]: not starting, configuration/credentials error. ... failed!
Jul 16 03:01:57 metrics-test-m stackdriver-agent[3829]: (warning).
Jul 16 03:01:57 metrics-test-m systemd[1]: Started LSB: start and stop Stackdriver Agent.
Is it possible to somehow monitor local disk usage in Dataproc and push the metrics to Google Cloud Metrics?