I need to know a current working directory URI/URL of Spark executor so I can copy some dependencies there before job executes. How do I get in Java ? What api should I call?
Asked
Active
Viewed 3,019 times
0
-
Spark executors are not long-lived processes, and you can't control where they run in the cluster – OneCricketeer Oct 02 '17 at 22:24
-
@cricket_007 If YARN knows where to put archives for spark-submit so it could be done in code as well in the main jar – YuGagarin Oct 02 '17 at 22:30
-
Right, that's what `SparkFiles` is for, as answered. But your definition of "in code" probably means the driver process, not the executors – OneCricketeer Oct 02 '17 at 23:40
1 Answers
1
Working directory is application specific so you want be able to get it before applications starts. It is best to use standard Spark mechanisms:
--jars
/spark.jars
- for JAR files.pyFiles
- for Python dependencies.SparkFiles
/--files
/--archives
- for everything else

OneCricketeer
- 179,855
- 19
- 132
- 245

user8710966
- 11
- 1
-
--archives does not always work. At least not on Azure Hdinsight I am using, so I have to resort to programmatic way until Microsoft fixes it or documents properly... – YuGagarin Oct 02 '17 at 22:17