2

I have a maven project where I use following Spark-dependencies:

<dependencies>
        <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-mllib_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-graphx_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-yarn_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-network-shuffle_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-flume_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>com.databricks</groupId>
      <artifactId>spark-csv_2.11</artifactId>
      <version>1.3.0</version>
    </dependency>
  </dependencies>

The spark version is 2.4.4

Now I run the following Code:

    SparkSession spark = SparkSession.builder()
            .master("local[*]")
            .config("spark.sql.warehouse.dir", "/tmp/spark")
            .appName("SurvivalPredictionMLP")
            .getOrCreate();
    //Reads the training set
    Dataset<Row> df = spark.sqlContext()
            .read()
            .format("com.databricks.spark.csv")
            .option("header", true)
            .option("inferSchema", true)
            .load("data/train.csv");
    //Show
    df.show();

But I get following exception at the getOrCreate() line:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/09/22 14:18:06 INFO SparkContext: Running Spark version 2.4.4
19/09/22 14:18:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/09/22 14:18:07 INFO SparkContext: Submitted application: SurvivalPredictionMLP
19/09/22 14:18:07 INFO SecurityManager: Changing view acls to: pro
19/09/22 14:18:07 INFO SecurityManager: Changing modify acls to: pro
19/09/22 14:18:07 INFO SecurityManager: Changing view acls groups to: 
19/09/22 14:18:07 INFO SecurityManager: Changing modify acls groups to: 
19/09/22 14:18:07 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(pro); groups with view permissions: Set(); users  with modify permissions: Set(pro); groups with modify permissions: Set()
Exception in thread "main" java.lang.NoClassDefFoundError: io/netty/channel/Channel
    at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:59)
    at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:461)
    at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:249)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
    at com.jdlp.projects.titanic.App.<init>(App.java:18)
    at com.jdlp.projects.titanic.App.main(App.java:33)
Caused by: java.lang.ClassNotFoundException: io.netty.channel.Channel
    at java.net.URLClassLoader$1.run(URLClassLoader.java:371)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 14 more
Caused by: java.util.zip.ZipException: invalid LOC header (bad signature)
    at java.util.zip.ZipFile.read(Native Method)
    at java.util.zip.ZipFile.access$1400(ZipFile.java:60)
    at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:734)
    at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:434)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
    at java.util.jar.Manifest$FastInputStream.fill(Manifest.java:476)
    at java.util.jar.Manifest$FastInputStream.readLine(Manifest.java:410)
    at java.util.jar.Manifest$FastInputStream.readLine(Manifest.java:444)
    at java.util.jar.Attributes.read(Attributes.java:376)
    at java.util.jar.Manifest.read(Manifest.java:234)
    at java.util.jar.Manifest.<init>(Manifest.java:81)
    at java.util.jar.Manifest.<init>(Manifest.java:73)
    at java.util.jar.JarFile.getManifestFromReference(JarFile.java:199)
    at java.util.jar.JarFile.getManifest(JarFile.java:180)
    at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:992)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:451)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
    ... 20 more

When I google those exceptions, it recommends me to change files etc.., but as I use maven I can't or shouldn't change anything.

Is there any way to solve this error?

Thanks!

Patrick
  • 552
  • 5
  • 17
  • are you using intellij? – sam Sep 22 '19 at 12:38
  • Issue looks like this one https://stackoverflow.com/q/32090921/2937891 – Yauheni Sep 22 '19 at 13:32
  • @Sam I am using eclipse – Patrick Sep 22 '19 at 20:19
  • @Yauheni But in maven I can't just the delete the repository and if I remove and reinstall all dependencies it still doesn't work – Patrick Sep 22 '19 at 20:24
  • Did you create a fat jar? – OneCricketeer Sep 22 '19 at 22:47
  • you might be missing some jar files related to `io.netty.channel.Channel` – sam Sep 23 '19 at 02:53
  • @Patrick Sure, it will be desperate step to remove all dependencies from local .m2 repository. What I'm suggesting to do is to find only corrupted dependency and remove it in order to force Maven download it from remote repository. Try to execute `mvn package --strict-checksums -X` to debug execution and find corrupted dependency, like this answer suggests https://stackoverflow.com/a/46566345/2937891 – Yauheni Sep 23 '19 at 13:24
  • @Sam I thought maven would import them automatically – Patrick Sep 23 '19 at 18:54
  • @Patrick Maven has imported to you machine but your fat jar is not containing those imported jar. You need to chose the packages to be included in your Fat jar. – sam Sep 23 '19 at 19:45

1 Answers1

0

It looks like you are using spark 2.11 in your pom file but you run program with spark 2.4.4. I have seen strange error when the version in pom did not match the version on my machine.

NoiseOrigin
  • 31
  • 1
  • 1
  • 6
  • 2.11 is the Scala version. – Ben Watson Sep 23 '19 at 13:54
  • Yes, I am doing the project from a book where Scala version is 2.11 and Spark version is 2.3.0, but 2.3.0 doesn't work for me, I get some other errors:<> The type org.apache.spark.sql.SparkSession$Builder cannot be resolved. It is indirectly referenced from required .class files. <> I can change the Scala version to 2.12, but that doesn't fix the error – Patrick Sep 23 '19 at 18:56