I am trying to utilize the --archives option available in spark-on-yarn in order to upload an archive file. Based on the documentation & as mentioned in this question, yarn will not only upload the zip file but will also automatically unarchive the zip file on the worker nodes.
From the logs, I can see that yarn is uploading the jar in spark's staging directory e.g.
17/09/19 01:28:57 INFO Client: Uploading resource file:/home/foo/bar/zoo.zip -> hdfs://abc.foo.bar:8020/user/xyz/.sparkStaging/application_1503584958553_4501/zoo.zip
The issue I am facing is that, although the zip file is getting copied into spark staging directory, it's not getting automatically unarchived & I am guessing it's also not getting copied in the worker nodes.
Assuming yarn does unarchive the zip files, is there a way to access the location of worker nodes programmatically?
I am running spark 2.2 against emr 5.8 which is having yarn 2.7.