In 1.5 or newer versions, dataproc:yarn.log-aggregation.enabled
is set to true by default. Under the hood, the yarn.log-aggregation-enable
property in /etc/hadoop/conf/yarn-site.xml
is set to true
, and the container logs are controlled by the yarn.nodemanager.remote-app-log-dir
property which is set to gs://<cluster-tmp-bucket>/<cluster-uuid>/yarn-logs
by default. Check this doc for more details on Dataproc tmp bucket.
In addition to dumping the logs at the location, there are several other ways to view the logs:
YARN CLI: If the cluster has not been deleted, SSH into the master node, then run yarn logs -applicationId <app-id>
. If you are not sure about the app ID, run yarn application -list -appStates ALL
to list all apps. This method works only when log aggregation is enabled.
YARN Application Timeline server: If you enabled Component Gateway and the cluster has not been deleted, open the cluster's "YARN Application Timeline" link in the "WEB INTERFACES" tab of the cluster's web UI, find the application attemp and its containers, click the "Logs" link. This method works only when log aggregation is enabled.
Cloud Logging: YARN container logs are available in Cloud Logging even after the cluster is deleted.
3.1) When dataproc:dataproc.logging.stackdriver.job.yarn.container.enable
if false
(which is the default) or the job is submitted through CLI e.g., spark-submit
instead of Dataproc jobs API , it is under the projects/<project-id>/logs/yarn-userlogs
log name of the cluster resource:
resource.type="cloud_dataproc_cluster"
resource.labels.cluster_name=<cluster-name>
resource.labels.cluster_uuid=<cluster-uuid>
log_name="projects/<project-id>/logs/yarn-userlogs"
3.2) When dataproc:dataproc.logging.stackdriver.job.yarn.container.enable
if true
, it is under the projects/<project-id>/logs/dataproc.job.yarn.container
log name of the job resource:
resource.type="cloud_dataproc_job"
resource.labels.job_id=<job_id>
resource.labels.job_uuid=<job_uuid>
log_name="projects/<project-id>/logs/dataproc.job.yarn.container"
In Dataproc 1.4 (deprecated) or older versions, the yarn.log-aggregation-enable
property in /etc/hadoop/conf/yarn-site.xml
is set to fasle
by default, and the container logs are controlled by the yarn.nodemanager.log-dirs
property which is set to /var/log/hadoop-yarn/userlogs
by default.