0

I need to know a current working directory URI/URL of Spark executor so I can copy some dependencies there before job executes. How do I get in Java ? What api should I call?

YuGagarin
  • 341
  • 7
  • 20
  • Spark executors are not long-lived processes, and you can't control where they run in the cluster – OneCricketeer Oct 02 '17 at 22:24
  • @cricket_007 If YARN knows where to put archives for spark-submit so it could be done in code as well in the main jar – YuGagarin Oct 02 '17 at 22:30
  • Right, that's what `SparkFiles` is for, as answered. But your definition of "in code" probably means the driver process, not the executors – OneCricketeer Oct 02 '17 at 23:40

1 Answers1

1

Working directory is application specific so you want be able to get it before applications starts. It is best to use standard Spark mechanisms:

  • --jars / spark.jars - for JAR files.
  • pyFiles - for Python dependencies.
  • SparkFiles / --files / --archives - for everything else
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • --archives does not always work. At least not on Azure Hdinsight I am using, so I have to resort to programmatic way until Microsoft fixes it or documents properly... – YuGagarin Oct 02 '17 at 22:17