5

I have a method in my spark application that loads the data from a MySQL database. the method looks something like this.

trait DataManager {

val session: SparkSession

def loadFromDatabase(input: Input): DataFrame = {
            session.read.jdbc(input.jdbcUrl, s"(${input.selectQuery}) T0",
              input.columnName, 0L, input.maxId, input.parallelism, input.connectionProperties)
    }
}

The method does nothing else other than executing jdbc method and loads data from the database. How can I test this method? The standard approach is to create a mock of the object session which is an instance of SparkSession. But since SparkSession has a private constructor I was not able to mock it using ScalaMock.

The main ask here is that my function is a pure side-effecting function (the side-effect being pull data from relational database) and how can i unit test this function given that I have issues mocking SparkSession.

So is there any way I can mock SparkSession or any other better way than mocking to test this method?

himanshuIIITian
  • 5,985
  • 6
  • 50
  • 70
rogue-one
  • 11,259
  • 7
  • 53
  • 75
  • Possible duplicate of [How to write unit tests in Spark 2.0+?](https://stackoverflow.com/questions/43729262/how-to-write-unit-tests-in-spark-2-0) – himanshuIIITian Mar 26 '18 at 04:30
  • 1
    @himanshuIIITian this is not a duplicate of that question. My question is very specific to a use case where my method only loads data from a database and how can I test it with a mock or any other method if possible. The question you linked doesnt talk about how to mock it or how to handle very specific scenario .. – rogue-one Mar 26 '18 at 05:06
  • 1
    Ok! I thought it is similar to it. Apologies for the confusion. – himanshuIIITian Mar 26 '18 at 05:41
  • 1
    What exactly do you want to test? If the query can be executed? To be honest I would not test this method due it doesn't contain any logic implemented by you (no offence). You are just just running some logic provided by spark - this should be tested on their side. If you want to test this anyways you might go with an embedded database. – TobiSH Mar 26 '18 at 07:13
  • Or did I got your question wrong and you are asking who to create a spark-session? Here you go: `SparkSession.builder().getOrCreate() ` – TobiSH Mar 26 '18 at 07:15
  • "To be honest I would not test this method due it doesn't contain any logic implemented by you" .. what if I have two of my parameters interchanged.. like this `session.read.jdbc(s"(${input.selectQuery}) T0", input.jdbcUrl ..` .. I agree it is not extremely important to test this code.. but nevertheless there is indeed some value in testing it.. – rogue-one Mar 27 '18 at 02:34

2 Answers2

1

In your case I would recommend not to mock the SparkSession. This would more or less mock the entire function (which you could do anyways). If you want to test this function my suggestion would be to run an embeded database (like H2) and use a real SparkSession. To do this you need to provide the SparkSession to your DataManager.

Untested sketch:

Your code:

class DataManager (session: SparkSession) {
         def loadFromDatabase(input: Input): DataFrame = {
            session.read.jdbc(input.jdbcUrl, s"(${input.selectQuery}) T0",
            input.columnName, 0L, input.maxId, input.parallelism, input.connectionProperties)
         }
    }

Your test-case:

class DataManagerTest extends FunSuite with BeforeAndAfter {
  override def beforeAll() {
    Connection conn = DriverManager.getConnection("jdbc:h2:~/test", "sa", "");
    // your insert statements goes here
    conn.close()
  }

  test ("should load data from database") {
    val dm = DataManager(SparkSession.builder().getOrCreate())
    val input = Input(jdbcUrl = "jdbc:h2:~/test", selectQuery="SELECT whateveryounedd FROM whereeveryouputit ")
    val expectedData = dm.loadFromDatabase(input)
    assert(//expectedData)
  }
}
TobiSH
  • 2,833
  • 3
  • 23
  • 33
1

You can use mockito scala to mock SparkSession as shown in this article.

alecswan
  • 3,670
  • 5
  • 25
  • 35