I have a streaming job that runs on DC/OS on AWS. When i run the job for the first time, and specify checkpoint folder to be AWS S3, everything goes well.
After I stop it, and start it again, I expect streaming to recover from checkpoint, but I get following error:
ERROR SparkContext: Error initializing SparkContext. java.lang.Exception: spark.executor.extraJavaOptions is not allowed to set Spark options (was '-Dspark.mesos.executor.docker.image=mesosphere/spark:1.0.0-1.6.1-2 '). Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit.
I have set recoverable streaming using example from https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala,
and connection to S3 for check pointing from: Spark Streaming checkpoint to amazon s3
Problem that seems to be is that when recreating spark context from checkpoint file, it tries to change spark.mesos.executor.docker.image property, but I do not set this at all.
My spark configuration is quite simple and looks like this:
val conf = new SparkConf()
.setAppName("wattio-pipeline")
Did anyone encounter similar issue.
EDITED
I tried setting spark conf in all these ways:
val conf = new SparkConf()
.setAppName("wattio-pipeline")
.setExecutorEnv("SPARK_JAVA_OPTS","")
.remove("spark.executor.extraJavaOptions")
.remove("spark.mesos.executor.docker.image")
//.set("spark.executor.extraJavaOptions","")
//.set("spark.mesos.executor.docker.image","mesosphere/spark:1.0.0-1.6.1-2")
But same error appears.
EDITED 2
I have tested the same AWS S3 checkpoint configuration on my local development machine (our own installation of SMACK stack) and streaming recovers correctly. This means that there is error in DCOS spark parameters and properties.
I have also filed JIRA issue: https://dcosjira.atlassian.net/browse/DCOS-131