5

According to Passing Functions to Spark,it claims:

accessing fields of the outer object will reference the whole object; To avoid this issue ...

I am considering that what is the risk of flowing code:

class MyClass {
  val field = "Hello"
  def doStuff(rdd: RDD[String]): RDD[String] = { rdd.map(x => field + x) }
}

references all of this would do any harm ?

chenzhongpu
  • 6,193
  • 8
  • 41
  • 79

1 Answers1

7

This will cause Spark to serialize your whole object and send it to each of the executors. If some of the fields of your object contain big amounts of data, it might be slow. Also it might cause task not serializable exception if your object is not serializable

Here's an example of the guy with this problem: Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects

Community
  • 1
  • 1
0x0FFF
  • 4,948
  • 3
  • 20
  • 26