I am trying to load a RDD from a Mysql database:
package ro.mfl.employees
import org.apache.spark.{SparkConf, SparkContext}
import java.sql.{Connection, DriverManager}
import org.apache.spark.rdd.JdbcRDD
class Loader(sc: SparkContext) {
Class.forName("com.mysql.jdbc.Driver").newInstance()
def connection(): Connection = {
DriverManager.getConnection("jdbc:mysql://localhost/employees", "sakila", "sakila")
}
def load(): Unit = {
val employeesRDD = new JdbcRDD(sc, connection, "select * from employees.employees", 0, 0, 1)
println(employeesRDD.count())
}
}
object Test extends App {
val conf = new SparkConf().setAppName("test")
val sc = new SparkContext(conf)
val l = new Loader(sc)
l.load()
}
When I execute this, I get an error saying
Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext
Serialization stack:
- object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@323a9221)
- field (class: ro.mfl.employees.Loader, name: sc, type: class org.apache.spark.SparkContext)
- object (class ro.mfl.employees.Loader, ro.mfl.employees.Loader@607c6d60)
- field (class: ro.mfl.employees.Loader$$anonfun$1, name: $outer, type: class ro.mfl.employees.Loader)
- object (class ro.mfl.employees.Loader$$anonfun$1, <function0>)
- field (class: org.apache.spark.rdd.JdbcRDD, name: org$apache$spark$rdd$JdbcRDD$$getConnection, type: interface scala.Function0)
- object (class org.apache.spark.rdd.JdbcRDD, JdbcRDD[0] at JdbcRDD at Loader.scala:17)
- field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
- object (class scala.Tuple2, (JdbcRDD[0] at JdbcRDD at Loader.scala:17,<function2>))
Has anyone encountered this problem?
I tried to make the Loader
class to extend java.io.Serializable
, but I got the same error, only with org.apache.spark.SparkContext
instead of Loader
.