-1

I was trying to build the below spark scala project using maven .The build was successful ,but when I ran that jar it is giving the below error.Please help me to fix

spark scala code

package com.spark.DataAnalysis

      import org.apache.log4j.Level
      import org.apache.spark.sql.{Dataset, SparkSession}
      import org.apache.spark.sql.functions._
      import org.apache.spark.SparkContext
      import org.apache.spark.SparkConf

object TwitterData {
  def main(args: Array[String]) {
    println("Start")
    System.setProperty("hadoop.home.dir","C://Sankha//Study//spark-2.3.4-bin-hadoop2.7//spark-2.3.4-bin-hadoop2//spark-2.3.4-bin-hadoop2.7")
    val conf = new SparkConf().setAppName("Spark Scala WordCount Example").setMaster("local[1]")
        val spark = SparkSession.builder().appName("CsvExample").master("local").getOrCreate()
        val sc = new SparkContext(conf)
        val csvData = sc.textFile("C:\\Sankha\\Study\\data\\twitter-airline-sentiment\\Tweets.csv",3)
        val map_data = csvData.map(x=> x.split(",")).filter(x=> (x.length  < 13)).filter(x=> x(5) == "Virgin America")
        println(map_data.count())

  }
}

maven build code :

mvn package

Running the spark code from command line as below

spark-submit --class sparkWCExample.spWCExample.Twitter --master local[2] C:\Sankha\Study\spark_ws\spWCExample\target\spWCExample-0.0.1-SNAPSHOT.jar C:\Sankha\Study\spark_ws\spWCExample\target\out


Exception :

20/03/04 02:45:58 INFO Executor: Adding file:/C:/Users/sankh/AppData/Local/Temp/spark-ae5c0e2c-76f7-42d9-bd2a-6b1f5b191bd8/userFiles-ef86ac49-debf-4d19-b2e9-4f0c1cb83325/spWCExample-0.0.1-SNAPSHOT.jar to class loader
20/03/04 02:45:58 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.io.IOException: unexpected exception type
        at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1736)
        at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1266)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2078)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)

Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1260)
        ... 61 more
Caused by: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
        at sparkWCExample.spWCExample.Twitter$.$deserializeLambda$(Twitter.scala)
        ... 71 more
Caused by: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
        ... 72 more
Caused by: java.lang.ClassNotFoundException: scala.runtime.LambdaDeserialize
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 72 more

Please advise

POM xml is as below :

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>sparkWCExample</groupId>
  <artifactId>spWCExample</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>spWCExample</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
        <dependency>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
          <version>3.8.1</version>
          <scope>test</scope>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>2.4.5</version>
        </dependency>

            <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.12</artifactId>
            <version>2.4.5</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.12.3</version>
        </dependency>
  </dependencies>
  <build>
 <plugins>
     <plugin>
         <groupId>org.apache.maven.plugins</groupId>
         <artifactId>maven-compiler-plugin</artifactId>
         <version>3.3</version>
     </plugin>
 </plugins>
 </build>
</project>

Please check and let me know

1 Answers1

2

There seem to be a few issues with your code and your POM.

Talking about the code, you have created a sparksession and a sparkcontext separately even though you can just create the SparkSession object and it has the sparkcontext present. Also, you have set the spark properties in both your code and your spark-submit command. I recommend you create a separate sparkProperties file and pass it onto your spark-submit command (I'll share the file and the command too).

So, you can write the code as follows:

   package com.spark.DataAnalysis

      import org.apache.log4j.Level
      import org.apache.spark.sql.{Dataset, SparkSession}
      import org.apache.spark.sql.functions._
      import org.apache.spark.SparkContext
      import org.apache.spark.SparkConf

object TwitterData {
  def main(args: Array[String]) {
    println("Start")
    System.setProperty("hadoop.home.dir","C://Sankha//Study//spark-2.3.4-bin-hadoop2.7//spark-2.3.4-bin-hadoop2//spark-2.3.4-bin-hadoop2.7")
        val spark = SparkSession.builder().appName("CsvExample").master("local").getOrCreate()
        val csvData = .textFile("C:\\Sankha\\Study\\data\\twitter-airline-sentiment\\Tweets.csv",3)
        val map_data = csvData.map(x=> x.split(",")).filter(x=> (x.length  < 13)).filter(x=> x(5) == "Virgin America")
        println(map_data.count())
        spark.close

  }
}

Now, coming to your pom.xml, you haven't added the maven-assembly-plugin and have just used the maven compiler plugin. This means that your code is compiled using the dependencies but the dependencies are not packaged in the jar. In this case, your scala dependency was not packaged and also not found in your system so it gave you an error. This is why it is always better to use maven assembly plugin.

So, your new pom.xml should look like this:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>sparkWCExample</groupId>
<artifactId>spWCExample</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<name>spWCExample</name>
<url>http://maven.apache.org</url>

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.12</artifactId>
        <version>2.4.5</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.12</artifactId>
        <version>2.4.5</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.12.3</version>
    </dependency>
</dependencies>
<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.3</version>
        </plugin>
    </plugins>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>2.4</version>
        <configuration>
            <descriptorRefs>
                <descriptorRef>jar-with-dependencies</descriptorRef>
            </descriptorRefs>
        </configuration>
        <executions>

            <execution>
                <id>assemble-all</id>
                <phase>package</phase>
                <goals>
                    <goal>single</goal>
                </goals>
            </execution>
        </executions>
    </plugin>
</build>

The sample properties file is as follows:

spark.master    local[2]
spark.submit.deployMode client
spark.driver.memory     2G
spark.executor.memory   2G
spark.executor.instances        2
spark.executor.cores    2
spark.driver.maxResultSize      1g

Now, your spark-submit command will be much easier to write after adding the sparkProperties.properties file:

spark-submit --class sparkWCExample.spWCExample.Twitter C:\Sankha\Study\spark_ws\spWCExample\target\spWCExample-0.0.1-SNAPSHOT.jar C:\Sankha\Study\spark_ws\spWCExample\target\out --properties-file sparkProperties.properties

I hope I have answered your question elaborately. Feel free to ask if you have any other doubts.

Siddharth Goel
  • 356
  • 2
  • 10
  • ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Sankha\temp\spark-0463536d-891f-420e-b555-f8e42243e4f9\userFiles-165a61af-5484-4580-bc8e-c009816b076c java.io.IOException: Failed to delete: C:\Sankha\temp\spark-0463536d-891f-420e-b555-f8e42243e4f9\userFiles-165a61af-5484-4580-bc8e-c009816b076c\spWCExample-0.0.1-SNAPSHOT-jar-with-dependencies.jar – user1670805 Mar 05 '20 at 20:14
  • I tried your steps and it worked .Thank you .I am now only getting the below temp file delete error which is fine as I am getting the output before that . – user1670805 Mar 05 '20 at 20:14
  • I am new in Spark , so I might be contacting you again if I see other issues .Hope you will help me – user1670805 Mar 05 '20 at 20:15
  • Sure, you can reach out any time you want. And for the error listed above, you can try closing your SparkSession once you are done with the computations. This will ensure the freeing up of cluster resources for other applications. I've made the appropriate change in the code above. Tell me if this helps. – Siddharth Goel Mar 06 '20 at 06:45
  • can you please help me for the below issues I am facing – user1670805 Mar 06 '20 at 15:47
  • https://stackoverflow.com/questions/60567432/todf-is-not-working-in-spark-scala-ide-but-works-perfectly-in-spark-shell – user1670805 Mar 06 '20 at 15:47
  • @Siddharth Goel can you help and suggest how to handle this https://stackoverflow.com/questions/62036791/while-writing-to-hdfs-path-getting-error-java-io-ioexception-failed-to-rename – BdEngineer May 27 '20 at 06:45