Invalid class exception in apache spark

Question

I am trying to run a spark job using spark-submit. When I run it in eclipse the job runs without any issue. When I copy the same jar file to a remote machine and run the job there I get the below issue

17/08/09 10:19:15 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-50-70-180.ec2.internal): java.io.InvalidClassException: org.apache.spark.executor.TaskMetrics; local class incompatible: stream classdesc serialVersionUID = -2231953621568687904, local class serialVersionUID = -6966587383730940799
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1829)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1986)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I saw some other links in SO and tried the below

Changed the version of spark jars to 2.11 from 2.10 which I was using before. Now the dependencies in pom look like this

 <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.0.2</version>
    <scope>provided</scope>

</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.0.2</version>
    <scope>provided</scope>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-yarn_2.10 -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-yarn_2.11</artifactId>
    <version>2.0.2</version>
    <scope>provided</scope>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-mllib_2.11</artifactId>
    <version>2.0.2</version>
    <scope>provided</scope>
</dependency>

I also checked that the version 2.11-2.0.2 exists in the jars folder of spark as suggested in a few links.
I also added provided in the dependencies as suggested in few links

None of the above helped. Any help would be of great help as I am stuck in this issue. Thanks in advance. Cheers

Edit 1: This is the spark-submit command

spark-submit --deploy-mode cluster --class "com.abc.ingestion.GenericDeviceIngestionSpark" /home/hadoop/sathiya/spark_driven_ingestion-0.0.1-SNAPSHOT-jar-with-dependencies.jar "s3n://input-bucket/input-file.csv" "SIT" "accessToken" "UNKNOWN" "bundleId" "[{"idType":"D_ID","idOrder":1,"isPrimary":true},{"idType":"HASH_DEVICE_ID","idOrder":2,"isPrimary":false}]"

Edit 2:

I also tried adding the variable serialVersionUID = -2231953621568687904L; to the related class but that didn't resolve the issue

Version mismatch between the spark job which you submit. Please check the spark version in the cluster and add the same in pom file. — jose praveen, Aug 09 '17 at 11:55
@JosePraveen I checked the spark version in the machine and it returned version 2.0.2. Hence I modified the version of spark to 2.0.2 in my pom file. Please see the dependencies which I have added — Sathiya Narayanan, Aug 09 '17 at 11:57
@SathiyaNarayanan I think the problem is w.r.t serialization. please check this [how resolve java.io.InvalidClassException: local class incompatible](https://stackoverflow.com/a/27655035/8035260) — jose praveen, Aug 09 '17 at 12:04
Hi @JosePraveen Thanks for your prompt reply. I have already added the serialVersionUId variable to my class. That doesn't solve my issue. — Sathiya Narayanan, Aug 09 '17 at 12:05
Why did you change to 2.11? Do you know what version of Scala is installed where you are running the code? — OneCricketeer, Aug 09 '17 at 12:33
@cricket_007 No I checked the jar files in jars folder of spark and saw that most of them had the jar file names with suffix as follows **spark-yarn_2.11-2.0.2.jar** Hence I modified the pom to 2.11 — Sathiya Narayanan, Aug 09 '17 at 12:49

score 3 · Accepted Answer · answered Aug 11 '17 at 12:27

I finally resolved the issue. I commented out all the dependencies and uncommented them one at a time. First I uncommented spark_core dependency and the issue got resolved. I uncommented another dependency in my project which again brought back the issue. Then on investigation I found that the second dependency was in turn having dependency of a different version(2.10) of spark_core which was causing the issue. I added exclusion to the dependency as below:

<dependency>
        <groupId>com.data.utils</groupId>
        <artifactId>data-utils</artifactId>
        <version>1.0-SNAPSHOT</version>
        <exclusions>
            <exclusion>
                <groupId>javax.ws.rs</groupId>
                <artifactId>javax.ws.rs-api</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.10</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

This resolved the issue. Just in case someone gets stuck on this issue. Thanks @JosePraveen for your valuable comment which gave me the hint.

score 1 · Answer 2 · answered Nov 14 '18 at 06:46

We see this issue when slightly different jar versions were being used on the Spark master and 1 or more of the Spark slaves.

I was facing this issue because I had only copied my jar to the master node. Once I copied the jar to all the slave nodes, my application started working just fine.

Invalid class exception in apache spark

2 Answers2