I know there's been some questions about Spark's temporary files like this one but I can't find one that answers my question.
I am using Spark 1.6.0 in a standalone mode and I run it under Windows, so when I set SPARK_LOCAL_DIRS
on each worker, this gives the information where the temporary files will be written. Nonetheless, I get a strange behavior with snappy
. Indeed whatever I tried, each executor writes a copy of snappy's dll into my C:\Windows
directory (that gets really poluted). The piece of code that's supposed to deal with temporary files in Spark is:
def getConfiguredLocalDirs(conf: SparkConf): Array[String] = {
...
else if (conf.getenv("SPARK_EXECUTOR_DIRS") != null) {
conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator)
} else if (conf.getenv("SPARK_LOCAL_DIRS") != null) {
conf.getenv("SPARK_LOCAL_DIRS").split(",")
} ... (stuffs on mesos)
} else {
// In non-Yarn mode (or for the driver in yarn-client mode), we cannot trust the user
// configuration to point to a secure directory. So create a subdirectory with restricted
// permissions under each listed directory.
conf.get("spark.local.dir", System.getProperty("java.io.tmpdir")).split(",")
}
}
and I tried any combination of those but I always have my snappy-1.1.2-*-snappyjava.dll
on C:\Windows
(I think I get this because this is the java.io.tmpdir
).
Does someone know how to set the temporary directory where the executors write down the dlls? Thanks.
EDIT. It's indeed due to the property java.io.tmpdir
, and I can change it with:
val opt = "-Djava.io.tmpdir=myPath"
conf.set("spark.executor.extraJavaOptions", opt)
but, unfortunately this makes this all the same for each executor on any machine.