1

I'm trying to run a distributed Kmeans using a distributed Kmeans of Spark MLLIB and I'm getting the following error:

Caused by: java.lang.ClassNotFoundException: breeze.storage.Zero$DoubleZero$
    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

I'm using scala 2.13.0 and spark 3.3.0. and breeze 2.1.0 Does anyone know how to solve it?

Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66
Reda20
  • 53
  • 5

2 Answers2

1

Here is a small example that reproduces the error:

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors

object example {

  def main(args: Array[String]): Unit = {

    val data = List(Vectors.dense(Array(-1.2067543462416856,1.3095550194913217)),
      Vectors.dense(Array(0.07214871343256794,1.2317180069067792)),
      Vectors.dense(Array(1.2382694463625876,1.498952083293292)),
      Vectors.dense(Array(1.4227882484992194,1.1326606729937694)),
      Vectors.dense(Array(0.028564865614650627,1.1697757168356784)),
      Vectors.dense(Array(1.3008028016732505,1.3992632244080325)),
      Vectors.dense(Array(-0.4515288119480808,-0.44940482288858774)),
      Vectors.dense(Array(1.3912470190900275,-1.2895692645735999)),
      Vectors.dense(Array(-0.5498887597576244,-0.4937628444210279)),
      Vectors.dense(Array(0.03640545102051686,-1.3540754314126295)),
      Vectors.dense(Array(-1.2520223542111055,1.2709646562853476)))

    Logger.getLogger("org").setLevel(Level.OFF)

    val SS = SparkSession
      .builder()
      .appName("example")
      .config("spark.master", "local[*]").getOrCreate()
    val sc = SS.sparkContext

    val rdd = sc.parallelize(data)
    val kmeans = KMeans.train(rdd,10,100)
  }
}
Reda20
  • 53
  • 5
0

Looks like an issue with dependencies.

In Breeze 1.3- breeze.storage.Zero.DoubleZero was defined as

@SerialVersionUID(1L)
implicit object DoubleZero extends Zero[Double] {
  override def zero = 0.0
}

https://github.com/scalanlp/breeze/blob/releases/v1.3/math/src/main/scala/breeze/storage/Zero.scala#L77

and breeze.storage.Zero.DoubleZero.getClass produced breeze.storage.Zero$DoubleZero$.

But in Breeze 2.0+ DoubleZero is defined as

implicit val DoubleZero: Zero[Double] = Zero(0.0)

https://github.com/scalanlp/breeze/blob/releases/v2.0/math/src/main/scala/breeze/storage/Zero.scala#L46

@SerialVersionUID(1L)
case class Zero[@specialized T](zero: T) extends Serializable

and breeze.storage.Zero.DoubleZero.getClass produces breeze.storage.Zero$mcD$sp (because of @specialized) while Class.forName("breeze.storage.Zero$DoubleZero$") throws ClassNotFoundException.

You should look what dependency still uses Breeze 1.3-


Update. Thanks for MCVE.

Debugging shows that NoClassDefFoundError/ClassNotFoundException is thrown here

  private lazy val loadableSparkClasses: Seq[Class[_]] = {
    Seq(
      // ...
      "org.apache.spark.ml.linalg.SparseMatrix",   // <---
      // ...
    ).flatMap { name =>
      try {
        Some[Class[_]](Utils.classForName(name))   // <---
      } catch {
        case NonFatal(_) => None // do nothing
        case _: NoClassDefFoundError if Utils.isTesting => None // See SPARK-23422.
      }
    }
  }

https://github.com/apache/spark/blob/v3.3.0/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala#L521

Simpler reproduction is

Class.forName("org.apache.spark.ml.linalg.SparseMatrix")
// java.lang.NoClassDefFoundError: breeze/storage/Zero$DoubleZero$ ...
// Caused by: java.lang.ClassNotFoundException: breeze.storage.Zero$DoubleZero$ ...

As I said, one of dependencies uses Breeze 1.3- although you're thinking that you're using Breeze 2.1.0. Namely, org.apache.spark.ml.linalg.SparseMatrix is from spark-mllib-local and spark-mllib-local 3.3.0 uses Breeze 1.2

<dependency>
    <groupId>org.scalanlp</groupId>
    <artifactId>breeze_2.13</artifactId>
    <version>1.2</version>
    <scope>compile</scope>
    <exclusions>
        <exclusion>
            <artifactId>commons-math3</artifactId>
            <groupId>org.apache.commons</groupId>
        </exclusion>
    </exclusions>
</dependency>

https://repo1.maven.org/maven2/org/apache/spark/spark-mllib-local_2.13/3.3.0/spark-mllib-local_2.13-3.3.0.pom

So Spark 3.3.0 (and 3.3.2) is incompatible with Breeze 2.0+. Use Breeze 1.3-

scalaVersion := "2.13.0"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-sql"   % "3.3.0",
  "org.apache.spark" %% "spark-mllib" % "3.3.0",
  "org.scalanlp"     %% "breeze"      % "1.3"
)

Then your code runs successfully.

Compatibility issues between different versions of Spark and Breeze are not rare:

https://github.com/scalanlp/breeze/issues/710

Apache Spark - java.lang.NoSuchMethodError: breeze.linalg.Vector$.scalarOf()Lbreeze/linalg/support/ScalarOf

https://github.com/scalanlp/breeze/issues/690

Breeze should be upgraded to 2.0 in Spark 3.4.0

https://issues.apache.org/jira/browse/SPARK-39616

Meanwhile you can try it with the following build.sbt

scalaVersion := "2.13.0"

resolvers += "apache-repo" at "https://repository.apache.org/content/groups/snapshots"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-sql"   % "3.4.0-SNAPSHOT",
  "org.apache.spark" %% "spark-mllib" % "3.4.0-SNAPSHOT",
  "org.scalanlp"     %% "breeze"      % "2.1.0"
)

Then your code runs successfully too.

Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66
  • Thank you for your answer! But I'm using breeze 2.1.0 not 1.3- – Reda20 Mar 15 '23 at 19:02
  • @Reda20 Well, you may think that you're using breeze 2.1.0 but it looks like some of dependencies is compiled with respect to breeze 1.3-. `breeze.storage.Zero$DoubleZero$` is from breeze 1.3-. Look at your classpath. Or try to prepare [MCVE](https://stackoverflow.com/help/minimal-reproducible-example). – Dmytro Mitin Mar 15 '23 at 19:06
  • https://stackoverflow.com/questions/1457863/what-causes-and-what-are-the-differences-between-noclassdeffounderror-and-classn – Dmytro Mitin Mar 15 '23 at 19:12
  • 1
    Please find bellow a small example that reproduces the error! – Reda20 Mar 15 '23 at 19:51