This is a old question and there are couple of answer that are good enough but I would like to give one more approach that can be used to make it work.
You can create a trait that extends from serializable and create spark session as a lazy variable and then through out your project in all the objects that you create, you can extend that trait and it will give you sparksession instance.
Code as below:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.DataFrame
trait SparkSessionWrapper extends Serializable {
lazy val spark: SparkSession = {
SparkSession.builder().appName("TestApp").getOrCreate()
}
//object with the main method and it extends SparkSessionWrapper
object App extends SparkSessionWrapper {
def main(args: Array[String]): Unit = {
val readdf = ReadFileProcessor.ReadFile("testpath")
readdf.createOrReplaceTempView("TestTable")
val viewdf = spark.sql("Select * from TestTable")
}
}
object ReadFileProcessor extends SparkSessionWrapper{
def ReadFile(path: String) : DataFrame = {
val df = spark.read.format("csv").load(path)
df
}
}
As you are extending the SparkSessionWrapper on both the Objects that you created, spark session would get initialized when first time spark variable is encountered in the code and then you refer it on any object that extends that trait without passing that as a parameter to the method. It works or give you a experience that is similar to notebook.
Update : If you even want it to be more generic and have an need to even set the custom appname based on the type of workflow you are running you can do it as below :
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.DataFrame
trait SparkSessionWrapper extends Serializable {
lazy val spark: SparkSession = {
createSparkSession(appname)
}
def appname : String
def createSparkSession(appname : String) : SparkSession ={
SparkSession.builder().appName(appname).master("local[*]").getOrCreate()
}
//object with the main method and it extends SparkSessionWrapper
object App extends SparkSessionWrapper {
def main(args: Array[String]): Unit = {
val readdf = ReadFileProcessor.ReadFile("testpath")
readdf.createOrReplaceTempView("TestTable")
val viewdf = spark.sql("Select * from TestTable")
}
override def appname: String = "ReadFile"
}
object ReadFileProcessor extends SparkSessionWrapper{
def ReadFile(path: String) : DataFrame = {
val df = spark.read.format("csv").load(path)
df
}
override def appname: String = "ReadcsvFile"
}
the only main difference is that you need to create an abstract function inside the trait and then you would have to override that into any of the startup class that you are using to provide the value.