4

When trying to simplify unit testing with Spark and Scala, I am using scala-test and mockito-scala (and mockito sugar). This simply lets you do something like this:

val sparkSessionMock = mock[SparkSession]

Then you can usually do all the magic with "when" and "verify".

But if you have some implementations that has the necessary import of

import spark.implicits._

in its code, then the simplicity of unit testing seems to be gone (or at least I didn't find the most proper way to solve this, yet).

I end up in getting this error:

org.mockito.exceptions.verification.SmartNullPointerException: 
You have a NullPointerException here:
-> at ...
because this method call was *not* stubbed correctly:
-> at scala.Option.orElse(Option.scala:289)
sparkSession.implicits();

Simply mocking the call on the "implicits" object inside SparkSession won't help due to typing issues:

val implicitsMock = mock[SQLImplicits]
when(sparkSessionMock.implicits).thenReturn(implicitsMock)

will not let you pass, since it says it will require the type of the object inside your mock:

require: sparkSessionMock.implicits.type
found: implicitsMock.type

And please don't tell me that I should rather do SparkSession.builder.getOrCreate()... since then this isn't a unit-test anymore but a more heavy weight integration test.

(Edit): here is a complete reproducible example:

import org.apache.spark.sql._
import org.mockito.Mockito.when
import org.scalatest.{ FlatSpec, Matchers }
import org.scalatestplus.mockito.MockitoSugar

case class MyData(key: String, value: String)

class ClassToTest()(implicit spark: SparkSession) {
    import spark.implicits._

    def read(path: String): Dataset[MyData] = 
         spark.read.parquet(path).as[MyData]
}

class SparkMock extends FlatSpec with Matchers with MockitoSugar {

     it should "be able to mock spark.implicits" in {
         implicit val sparkMock: SparkSession = mock[SparkSession]
         val implicitsMock = mock[SQLImplicits]
         when(sparkMock.implicits).thenReturn(implicitsMock)
         val readerMock = mock[DataFrameReader]
         when(sparkMock.read).thenReturn(readerMock)
         val dataFrameMock = mock[DataFrame]
         when(readerMock.parquet("/some/path")).thenReturn(dataFrameMock)
         val dataSetMock = mock[Dataset[MyData]]
         implicit val testEncoder: Encoder[MyData] = Encoders.product[MyData]
         when(dataFrameMock.as[MyData]).thenReturn(dataSetMock)

         new ClassToTest().read("/some/path/") shouldBe dataSetMock
    }
 }
Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66
  • I'm not sure I understand completely what you're doing but `when[SQLImplicits](sparkSessionMock.implicits).thenReturn(implicitsMock)` seems to compile. Do you have reproducible example? – Dmytro Mitin Oct 26 '20 at 17:19
  • Please see the [article](https://medium.com/analytics-vidhya/playing-with-spark-in-scala-warm-up-game-8bfbb7cfbcc4) where `SparkSession` is mocked (the link is from the [question](https://stackoverflow.com/questions/49483987/mocking-sparksession-for-unit-testing)). – Dmytro Mitin Oct 27 '20 at 14:52
  • @DmytroMitin thanks for the hint to this article, this is a very nice example and very helpful. I already had found this one earlier looking for solutions to this problem. The unfortunate things about this example is that it doesn't mock the "import spark.implicits._" – Felix.Reuthlinger Oct 28 '20 at 07:29
  • 1
    @DmytroMitin I added a complete example in the original post. I also tried the change that you proposed adding the type to "when[...]...", but when running this, you will get another error, since it is not the same as "sparkMock.implicits.type": – Felix.Reuthlinger Oct 28 '20 at 08:40

1 Answers1

2

You can't mock implicits. Implicits are resolved at compile time while mocking occurs at runtime (runtime reflection, bytecode manipulation via Byte Buddy). You can't import at compile time implicits that will be mocked only at runtime. You'll have to resolve implicits manually (in principle you can resolve implicits at runtime if you launch compiler once again at runtime but this would be much harder 1 2 3 4).

Try

class ClassToTest()(implicit spark: SparkSession, encoder: Encoder[MyData]) {
  def read(path: String): Dataset[MyData] = 
    spark.read.parquet(path).as[MyData]
}

class SparkMock extends AnyFlatSpec with Matchers with MockitoSugar {

  it should "be able to mock spark.implicits" in {
    implicit val sparkMock: SparkSession = mock[SparkSession]
    val readerMock = mock[DataFrameReader]
    when(sparkMock.read).thenReturn(readerMock)
    val dataFrameMock = mock[DataFrame]
    when(readerMock.parquet("/some/path")).thenReturn(dataFrameMock)
    val dataSetMock = mock[Dataset[MyData]]
    implicit val testEncoder: Encoder[MyData] = Encoders.product[MyData]
    when(dataFrameMock.as[MyData]).thenReturn(dataSetMock)

    new ClassToTest().read("/some/path") shouldBe dataSetMock
  }
}

//[info] SparkMock:
//[info] - should be able to mock spark.implicits
//[info] Run completed in 2 seconds, 727 milliseconds.
//[info] Total number of tests run: 1
//[info] Suites: completed 1, aborted 0
//[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
//[info] All tests passed.

Please notice that "/some/path" should be the same in both places. In your code snippet two strings were different.

Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66
  • Thanks for the answer. That is exactly also the way, how I did re-write my code in the end, too. Problematic part about this is still, that usually people tend to use simply importing the implicits a lot. So, if you have a larger code base, you will find this a lot. And it would need to be replaced everywhere. That was the reason I was wondering if there would be a chance to also mock this simply. :) – Felix.Reuthlinger Nov 02 '20 at 08:56