0

Suppose I have the following class in Spark Scala:

class SparkComputation(i: Int, j: Int) {
  def something(x: Int, y: Int) = (x + y) * i

  def processRDD(data: RDD[Int]) = {
    val j = this.j
    val something = this.something _
    data.map(something(_, j))
  }
}

I get the Task not serializable Exception when I run the following code:

val s = new SparkComputation(2, 5)
val data = sc.parallelize(0 to 100)
val res = s.processRDD(data).collect

I'm assuming that the exception occurs because Spark is trying to serialize the SparkComputation instance. To prevent this from happening, I have stored the class members I'm using in the RDD operation in local variables (j and something). However, Spark still tries to serialize SparkComputation object because of the method. Is there anyway to pass the class method to map without forcing Spark to serializing the whole SparkComputation class? I know the following code works without any problem:

def processRDD(data: RDD[Int]) = {
    val j = this.j
    val i = this.i
    data.map(x => (x + j) * i)
  }

So, the class members that store values are not causing the problem. The problem is with the function. I have also tried the following approach with no luck:

class SparkComputation(i: Int, j: Int) {
  def processRDD(data: RDD[Int]) = {
    val j = this.j
    val i = this.i
    def something(x: Int, y: Int) = (x + y) * i
    data.map(something(_, j))
  }
}
Ashkan
  • 1,643
  • 5
  • 23
  • 45
  • This is famous error occurs due to class declaration... change it to object or extend serializable then it should work. – Ram Ghadiyaram Nov 11 '16 at 03:56
  • @RamPrasadG There is no way to store the methods in local values like I do with other members (`i` and `j` in this example)? – Ashkan Nov 11 '16 at 03:59
  • have a look http://stackoverflow.com/questions/22592811/task-not-serializable-java-io-notserializableexception-when-calling-function-ou?rq=1 – Ram Ghadiyaram Nov 11 '16 at 04:02

1 Answers1

1

Make the class serializable:

class SparkComputation(i: Int, j: Int) extends Serializable {
  def something(x: Int, y: Int) = (x + y) * i

  def processRDD(data: RDD[Int]) = {
    val j = this.j
    val something = this.something _
    data.map(something(_, j))
  }
}