1

I am trying to write unit test for spark code. I know we can install Spark and then use SparkConf and SparkContext to write tests.

However, I wanted to check if there is any way we can write unit tests without installing Spark as my client don't want to install Spark on Jenkins server, where we intend to run our tests as part of automated process.

farooq.ind
  • 11
  • 2
  • 1
    If you have a `Maven` project, you could add the dependencies. Then use the Spark API to create a new session and set master as `local`. The problem that you may face is the cost of creating a SparkContext in every Unit Test classes. It won't run in a few milliseconds. – KeyMaker00 Feb 26 '20 at 17:23
  • We are using Python and Spark. Trying to run unit tests using Unittest framework. Are you sure we don't need to set SPARK_HOME or install Spark on Jenkins? – farooq.ind Feb 27 '20 at 19:02
  • We are using Python, PySpark, Unittest and Spark. Are you sure to create SparkConf object we don't need SPARK_HOME set or Spark installed on Jenkins machine? – farooq.ind Feb 27 '20 at 19:26

1 Answers1

1

You can setup Spark to run in a local cluster via code:

val conf = new SparkConf().setAppName(appName).setMaster("local")
val context = new SparkContext(conf)

Then, you can use the context to create RDDs of your data for testing:

context.makeRDD
Dan W
  • 5,718
  • 4
  • 33
  • 44
  • We are using Python and Spark. Are you sure to create SparkConf object we don't need SPARK_HOME set or Spark installed on Jenkins machine? – farooq.ind Feb 27 '20 at 19:01
  • We are using Python, PySpark, Unittest and Spark. Are you sure to create SparkConf object we don't need SPARK_HOME set or Spark installed on Jenkins machine? – farooq.ind Feb 27 '20 at 19:26
  • A quick search shows you can do this with PySpark as well - https://stackoverflow.com/questions/33811882/how-do-i-unit-test-pyspark-programs – Dan W Feb 27 '20 at 19:36