Spark on DC/OS recover streaming from checkpoint fails

Question

I have a streaming job that runs on DC/OS on AWS. When i run the job for the first time, and specify checkpoint folder to be AWS S3, everything goes well.

After I stop it, and start it again, I expect streaming to recover from checkpoint, but I get following error:

ERROR SparkContext: Error initializing SparkContext. java.lang.Exception: spark.executor.extraJavaOptions is not allowed to set Spark options (was '-Dspark.mesos.executor.docker.image=mesosphere/spark:1.0.0-1.6.1-2 '). Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit.

I have set recoverable streaming using example from https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala,

and connection to S3 for check pointing from: Spark Streaming checkpoint to amazon s3

Problem that seems to be is that when recreating spark context from checkpoint file, it tries to change spark.mesos.executor.docker.image property, but I do not set this at all.

My spark configuration is quite simple and looks like this:

val conf = new SparkConf()
    .setAppName("wattio-pipeline")

Did anyone encounter similar issue.

EDITED

I tried setting spark conf in all these ways:

val conf = new SparkConf()
    .setAppName("wattio-pipeline")
    .setExecutorEnv("SPARK_JAVA_OPTS","")
      .remove("spark.executor.extraJavaOptions")
      .remove("spark.mesos.executor.docker.image")
    //.set("spark.executor.extraJavaOptions","")
    //.set("spark.mesos.executor.docker.image","mesosphere/spark:1.0.0-1.6.1-2")

But same error appears.

EDITED 2

I have tested the same AWS S3 checkpoint configuration on my local development machine (our own installation of SMACK stack) and streaming recovers correctly. This means that there is error in DCOS spark parameters and properties.

I have also filed JIRA issue: https://dcosjira.atlassian.net/browse/DCOS-131

Are you setting that spark property somehow? It looks like it from your [previous question](http://stackoverflow.com/questions/37346321/spark-on-dc-os-java-lang-exception-spark-executor-extrajavaoptions-is-not-all) — Yuval Itzchakov, May 23 '16 at 10:50
No! I am not setting it at all. It looks like when recreating streaming, dcos mistakenly somehow sets it... — Srdjan Nikitovic, May 23 '16 at 11:04

Spark on DC/OS recover streaming from checkpoint fails

0 Answers0