Where are the dataproc spark job logs located? I know there are logs from the driver under "Logging" section but what about the execution nodes? Also, where are the detailed steps that Spark is executing logged (I know I can see them in the Application Master)? I am attempting to debug a script that seems to hang and spark seems to freeze.
Asked
Active
Viewed 5,070 times
1 Answers
2
UPDATE in Q3 2022: This answer is outdated, see Dataproc YARN container logs location for the latest info.
The task logs are stored on each worker node under /tmp
.
It is possible to collect them in one place via yarn log aggregation. Set these properties at cluster creation time (via --properties
with yarn:
prefix):
yarn.log-aggregation-enable=true
yarn.nodemanager.remote-app-log-dir=gs://${LOG_BUCKET}/logs
yarn.log-aggregation.retain-seconds=-1
Here's an article that discusses log management:
https://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
-
which config file are these properties part of? – Alex Nov 20 '17 at 19:52
-
as the property itself says, yarn-site.xml – Deepak Verma Sep 23 '18 at 11:46