4

Where are the dataproc spark job logs located? I know there are logs from the driver under "Logging" section but what about the execution nodes? Also, where are the detailed steps that Spark is executing logged (I know I can see them in the Application Master)? I am attempting to debug a script that seems to hang and spark seems to freeze.

Alex
  • 19,533
  • 37
  • 126
  • 195

1 Answers1

2

UPDATE in Q3 2022: This answer is outdated, see Dataproc YARN container logs location for the latest info.

The task logs are stored on each worker node under /tmp.

It is possible to collect them in one place via yarn log aggregation. Set these properties at cluster creation time (via --properties with yarn: prefix):

  • yarn.log-aggregation-enable=true
  • yarn.nodemanager.remote-app-log-dir=gs://${LOG_BUCKET}/logs
  • yarn.log-aggregation.retain-seconds=-1

Here's an article that discusses log management:

https://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

Dagang
  • 24,586
  • 26
  • 88
  • 133
tix
  • 2,138
  • 11
  • 18