Looks like an issue with dependencies.
In Breeze 1.3- breeze.storage.Zero.DoubleZero
was defined as
@SerialVersionUID(1L)
implicit object DoubleZero extends Zero[Double] {
override def zero = 0.0
}
https://github.com/scalanlp/breeze/blob/releases/v1.3/math/src/main/scala/breeze/storage/Zero.scala#L77
and breeze.storage.Zero.DoubleZero.getClass
produced breeze.storage.Zero$DoubleZero$
.
But in Breeze 2.0+ DoubleZero
is defined as
implicit val DoubleZero: Zero[Double] = Zero(0.0)
https://github.com/scalanlp/breeze/blob/releases/v2.0/math/src/main/scala/breeze/storage/Zero.scala#L46
@SerialVersionUID(1L)
case class Zero[@specialized T](zero: T) extends Serializable
and breeze.storage.Zero.DoubleZero.getClass
produces breeze.storage.Zero$mcD$sp
(because of @specialized
) while Class.forName("breeze.storage.Zero$DoubleZero$")
throws ClassNotFoundException
.
You should look what dependency still uses Breeze 1.3-
Update. Thanks for MCVE.
Debugging shows that NoClassDefFoundError
/ClassNotFoundException
is thrown here
private lazy val loadableSparkClasses: Seq[Class[_]] = {
Seq(
// ...
"org.apache.spark.ml.linalg.SparseMatrix", // <---
// ...
).flatMap { name =>
try {
Some[Class[_]](Utils.classForName(name)) // <---
} catch {
case NonFatal(_) => None // do nothing
case _: NoClassDefFoundError if Utils.isTesting => None // See SPARK-23422.
}
}
}
https://github.com/apache/spark/blob/v3.3.0/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala#L521
Simpler reproduction is
Class.forName("org.apache.spark.ml.linalg.SparseMatrix")
// java.lang.NoClassDefFoundError: breeze/storage/Zero$DoubleZero$ ...
// Caused by: java.lang.ClassNotFoundException: breeze.storage.Zero$DoubleZero$ ...
As I said, one of dependencies uses Breeze 1.3- although you're thinking that you're using Breeze 2.1.0. Namely, org.apache.spark.ml.linalg.SparseMatrix
is from spark-mllib-local
and spark-mllib-local
3.3.0 uses Breeze 1.2
<dependency>
<groupId>org.scalanlp</groupId>
<artifactId>breeze_2.13</artifactId>
<version>1.2</version>
<scope>compile</scope>
<exclusions>
<exclusion>
<artifactId>commons-math3</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
</exclusions>
</dependency>
https://repo1.maven.org/maven2/org/apache/spark/spark-mllib-local_2.13/3.3.0/spark-mllib-local_2.13-3.3.0.pom
So Spark 3.3.0 (and 3.3.2) is incompatible with Breeze 2.0+. Use Breeze 1.3-
scalaVersion := "2.13.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "3.3.0",
"org.apache.spark" %% "spark-mllib" % "3.3.0",
"org.scalanlp" %% "breeze" % "1.3"
)
Then your code runs successfully.
Compatibility issues between different versions of Spark and Breeze are not rare:
https://github.com/scalanlp/breeze/issues/710
Apache Spark - java.lang.NoSuchMethodError: breeze.linalg.Vector$.scalarOf()Lbreeze/linalg/support/ScalarOf
https://github.com/scalanlp/breeze/issues/690
Breeze should be upgraded to 2.0 in Spark 3.4.0
https://issues.apache.org/jira/browse/SPARK-39616
Meanwhile you can try it with the following build.sbt
scalaVersion := "2.13.0"
resolvers += "apache-repo" at "https://repository.apache.org/content/groups/snapshots"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "3.4.0-SNAPSHOT",
"org.apache.spark" %% "spark-mllib" % "3.4.0-SNAPSHOT",
"org.scalanlp" %% "breeze" % "2.1.0"
)
Then your code runs successfully too.