0

I have a spark standalone system and I am using this to submit a sparkR job on existing cloudera CDH cluster

Apache Spark Version
1.5.0, Hadoop 2.6

Cloudera Spark Version
1.5.0-cdh5.5.1, Hadoop 2.6.0-cdh5.5.1

Code:

library(SparkR, lib.loc = "/opt/BIG-DATA/spark-1.5.0-bin-hadoop2.6/R/lib")

sc <- sparkR.init(master = "spark://10.103.25.39:7077", appName = "SparkR_demo_RTA", sparkHome = "/opt/BIG-DATA/spark-1.5.0-bin-hadoop2.6", sparkEnvir = list(spark.executor.memory = '512m'))

sqlContext <- sparkRSQL.init(sc)
df <- createDataFrame(sqlContext, faithful)
head(df)

sparkR.stop()

Next I am submitting this sparkR-testcluster.R file as follows

export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/etc/hadoop/conf.pseudo
export SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/etc/hadoop/conf.pseudo
./bin/spark-submit  --master spark://10.103.25.39:7077 /opt/BIG-DATA/SparkR/sparkR-testcluster.R

However, I am getting following error (which if I understand it correctly is because of version mismatch)

16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161214124958-0014/151 on hostPort 10.103.40.186:7078 with 4 cores, 512.0 MB RAM
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/151 is now RUNNING
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/151 is now LOADING
16/12/14 12:51:26 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, 10.103.40.207): java.io.InvalidClassException: org.apache.spark.sql.types.StructType; local class incompatible: stream classdesc serialVersionUID = -2623502157469710728, local class serialVersionUID = 1299744747852393705
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

16/12/14 12:51:26 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 1.0 (TID 2, 10.103.25.39, PROCESS_LOCAL, 13045 bytes)
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/151 is now EXITED (Command exited with code 1)
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161214124958-0014/151 removed: Command exited with code 1
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 151
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor added: app-20161214124958-0014/152 on worker-20161208195437-10.103.40.186-7078 (10.103.40.186:7078) with 4 cores

....................
....................

16/12/14 12:51:26 ERROR r.RBackendHandler: dfToCols on org.apache.spark.sql.api.r.SQLUtils failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, 10.103.40.207): java.io.InvalidClassException: org.apache.spark.sql.types.StructType; local class incompatible: stream classdesc serialVersionUID = -2623502157469710728, local class serialVersionUID = 1299744747852393705
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.
Calls: head ... collect -> collect -> .local -> callJStatic -> invokeJava
Execution halted
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/154 is now LOADING
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/154 is now EXITED (Command exited with code 1)
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161214124958-0014/154 removed: Command exited with code 1
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 154
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor added: app-20161214124958-0014/155 on worker-20161208195437-10.103.40.186-7078 (10.103.40.186:7078) with 4 cores

I am failing to understand where am I going wrong

Any help?

Edit:

I want to know which jar should I add to run my job so that I don't face this issue. I understand that there is mismatch because it is picking wrong classes. How can I point to pick up correct class

Hardik Gupta
  • 4,700
  • 9
  • 41
  • 83
  • Possible duplicate of [Java serialization - java.io.InvalidClassException local class incompatible](http://stackoverflow.com/questions/8335813/java-serialization-java-io-invalidclassexception-local-class-incompatible) – Binary Nerd Dec 14 '16 at 08:10
  • Binary Nerd, this is not resolving my issue – Hardik Gupta Dec 14 '16 at 08:15

0 Answers0