I am submitting a (Scala/Java) job to a Spark cluster while including the following option on the spark-submit command:
--conf spark.yarn.log-aggregation-enable=true
I would hope that worker logs would be and remain available from/on the master with that option turned on, even when running time was short as in my case. However, when I yarn logs
for my submitted application after it has crashed I am getting:
Log aggregation has not completed or is not enabled
My submitted job consistently crashes early on, which is why I wish to to view all worker logs in the first place (the output from the spark-submit
alone does not seem to flush all logged messages before giving a stack trace of the crashing error, hence the motivation).
- Should I do or configure anything more than specify that yarn option on the submit command to get the aggregation working as expected?
- Would there by any alternative way for quickly seeing the full logs of the workers other than separately logging into each and every worker and looking for its log directories?