Scala/JVM noob here that wants to understand more about logging, specifically when using Apache Spark.
I have written a library in Scala that depends upon a bunch of Spark libraries, here are my dependencies:
import sbt._
object Dependencies {
object Version {
val spark = "2.2.0"
val scalaTest = "3.0.0"
}
val deps = Seq(
"org.apache.spark" %% "spark-core" % Version.spark,
"org.scalatest" %% "scalatest" % Version.scalaTest,
"org.apache.spark" %% "spark-hive" % Version.spark,
"org.apache.spark" %% "spark-sql" % Version.spark,
"com.holdenkarau" %% "spark-testing-base" % "2.2.0_0.8.0" % "test",
"ch.qos.logback" % "logback-core" % "1.2.3",
"ch.qos.logback" % "logback-classic" % "1.2.3",
"com.typesafe.scala-logging" %% "scala-logging" % "3.8.0",
"com.typesafe" % "config" % "1.3.2"
)
val exc = Seq(
ExclusionRule("org.slf4j", "slf4j-log4j12")
)
}
(admittedly I copied a lot of this from elsewhere).
I am able to package my code as a JAR using sbt package
which I can then call from Spark by placing the JAR into ${SPARK_HOME}/jars
. This is working great.
I now want to implement logging from my code so I do this:
import com.typesafe.scalalogging.Logger
/*
* stuff stuff stuff
*/
val logger : Logger = Logger("name")
logger.info("stuff")
however when I try and call my library (which I'm doing from Python, not that I think that's relevant here) I get an error:
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.company.package.class.function.
E : java.lang.NoClassDefFoundError: com/typesafe/scalalogging/Logger$
Clearly this is because com.typesafe.scala-logging
library is not in my JAR. I know I could solve this by packaging using sbt assembly
but I don't want to do that because it will include all the other dependencies and cause my JAR to be enormous.
Is there a way to selectively include libraries (com.typesafe.scala-logging
in this case) in my JAR? Alternatively, should I be attempting to log using another method, perhaps using a logger that is included with Spark?
Thanks to pasha701 in the comments I attempted packaging my dependencies by using sbt assembly
rather than sbt package
.
import sbt._
object Dependencies {
object Version {
val spark = "2.2.0"
val scalaTest = "3.0.0"
}
val deps = Seq(
"org.apache.spark" %% "spark-core" % Version.spark % Provided,
"org.scalatest" %% "scalatest" % Version.scalaTest,
"org.apache.spark" %% "spark-hive" % Version.spark % Provided,
"org.apache.spark" %% "spark-sql" % Version.spark % Provided,
"com.holdenkarau" %% "spark-testing-base" % "2.2.0_0.8.0" % "test",
"ch.qos.logback" % "logback-core" % "1.2.3",
"ch.qos.logback" % "logback-classic" % "1.2.3",
"com.typesafe.scala-logging" %% "scala-logging" % "3.8.0",
"com.typesafe" % "config" % "1.3.2"
)
val exc = Seq(
ExclusionRule("org.slf4j", "slf4j-log4j12")
)
}
Unfortunately, even if specifying the spark dependencies as Provided
my JAR went from 324K to 12M hence I opted to use println()
instead. Here is my commit message:
log using println
I went with the println option because it keeps the size of the JAR small.
I trialled use of com.typesafe.scalalogging.Logger but my tests failed with error:
java.lang.NoClassDefFoundError: com/typesafe/scalalogging/Logger
because that isn't provided with Spark. I attempted to use sbt assembly instead of sbt package but this caused the size of the JAR to go from 324K to 12M, even with spark dependencies set to Provided. A 12M JAR isn't worth the trade-off just to use scalaLogging, hence using println instead.
I note that pasha701 suggested using log4j instead as that is provided with Spark so I shall try that next. Any advice on using log4j from Scala when writing a Spark library would be much appreciated.