10

According to the release notes, and specifically the ticket Build and Run Spark on Java 17 (SPARK-33772), Spark now supports running on Java 17.

However, using Java 17 (Temurin-17.0.3+7) with Maven (3.8.6) and maven-surefire-plugin (3.0.0-M7), when running a unit test that uses Spark (3.3.0) it fails with:

java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x1e7ba8d9) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x1e7ba8d9

The stack is:

java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x1e7ba8d9) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x1e7ba8d9
  at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)
  at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
  at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:114)
  at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:353)
  at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:290)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:339)
  at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
  at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:464)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
  at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
  [...]

The question Java 17 solution for Spark - java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils was asked only 2 months ago, but this pre-dated the Spark 3.3.0 release, and thus predated official support for Java 17.

Why can't I run my Spark 3.3.0 test with Java 17, and how can we fix it?

Greg Kopff
  • 15,945
  • 12
  • 55
  • 78

3 Answers3

12

Even though Spark now supports Java 17, it still references the JDK internal class sun.nio.ch.DirectBuffer:

  // In Java 8, the type of DirectBuffer.cleaner() was sun.misc.Cleaner, and it was possible
  // to access the method sun.misc.Cleaner.clean() to invoke it. The type changed to
  // jdk.internal.ref.Cleaner in later JDKs, and the .clean() method is not accessible even with
  // reflection. However sun.misc.Unsafe added a invokeCleaner() method in JDK 9+ and this is
  // still accessible with reflection.
  private val bufferCleaner: DirectBuffer => Unit = [...]

Under the Java module system, access to this class is restricted. The Java 9 migration guide says:

If you must use an internal API that has been made inaccessible by default, then you can break encapsulation using the --add-exports command-line option.

We need to open access to our module. To do this for Surefire, we add this configuration to the plugin:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-surefire-plugin</artifactId>
  <version>3.0.0-M7</version>
  <configuration>
    <argLine>--add-exports java.base/sun.nio.ch=ALL-UNNAMED</argLine>
  </configuration>
</plugin>

Based on a discussion with one of the Spark developers, Spark adds the following in order to execute all of its internal unit tests.

These options are used to pass all Spark UTs, but maybe you don't need all.

--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED
--add-opens=java.base/sun.security.action=ALL-UNNAMED
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED

It was also commented:

However, these Options needn't explicit add when using spark-shell, spark-sql and spark-submit

Greg Kopff
  • 15,945
  • 12
  • 55
  • 78
  • Is there a Spark ticket open to fix this? – Garret Wilson Aug 28 '22 at 21:06
  • Surely those options are overkill. Does someone want to investigate further. I imagine most of options would work with `--add-exports` instead of `--add-opens` (see [docs](https://docs.oracle.com/en/java/javase/17/migrate/migrating-jdk-8-later-jdk-releases.html)), because surely Spark isn't using reflection on all those packages. For a simple use case of reading CSV files and saving to JSON locally, just `--add-exports java.base/sun.nio.ch=ALL-UNNAMED` is working for me. Still we shouldn't have this problem in the first place. Does Spark intend to fix this? – Garret Wilson Aug 28 '22 at 22:33
  • @GarretWilson The link to the `user@spark.apache.org` [mailing list discussion](https://lists.apache.org/thread/814cpb1rpp73zkhtv9t4mkzzrznl82yn) is in the answer if you want to take up the discussion with Yang Jie further. – Greg Kopff Aug 28 '22 at 22:45
  • This would be for a spark-submit, but can we fix this in scala code/testing (kicked off with sbt)? – combinatorist Feb 02 '23 at 21:00
3

Based on the discussions above I am using:

%SPARK_HOME%\bin\spark-submit.cmd --driver-java-options "--add-exports java.base/sun.nio.ch=ALL-UNNAMED" spark_ml_heart.py

with a single --add-exports to run a Python script on Spark 3.2.1 on Java 17.

You may need the full version with all --add-exports:

%SPARK_HOME%\bin\spark-submit.cmd --driver-java-options "--add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED" spark_ml_heart.py
Anton Andreev
  • 2,052
  • 1
  • 22
  • 23
  • How can I add the options inside supark-submit.cmd so that I can use pyspark inside my IDE (PyCharm)? – adranale Jan 03 '23 at 15:42
  • The single adjustment (first option) worked for me. Thanks! – combinatorist Jan 09 '23 at 19:05
  • To clarify, it worked for me for a spark-submit with spark 3.2, but I'm still not seeing anywhere in this answer how to run scala tests (like with sbt) on a spark project. Even with spark 3.3, I still get the OP's error, so something is configured differently with `sbt test` than a `spark-submit --master=local[*]`. – combinatorist Feb 02 '23 at 21:33
1

After fixing some of these errors, I got an error with the KryoSerializer:

java.lang.IllegalArgumentException: Unable to create serializer "com.esotericsoftware.kryo.serializers.FieldSerializer" for class: java.nio.HeapByteBuffer

I got around this issue by adding some of the VM arguments aforementioned by @Greg Kopff to my pom.xml (I am using maven):

    <plugin>
        <groupId>org.scalatest</groupId>
        <artifactId>scalatest-maven-plugin</artifactId>
        <version>${scalatest-maven-plugin.version}</version>
        <configuration>
            <argLine>
                --add-opens=java.base/java.lang.invoke=ALL-UNNAMED
                --add-opens=java.base/java.nio=ALL-UNNAMED
                --add-opens=java.base/java.util=ALL-UNNAMED
                --add-opens=java.base/sun.nio.ch=ALL-UNNAMED
            </argLine>
        </configuration>
    </plugin>
Oscar Drai
  • 141
  • 1
  • 7