My code looks like this (sorry, there's a reason I can't show the full code):
public class MyClass {
final A _field1; // Non-serializable object
final B _field2; // Non-serializable object
public void doSomething() {
myJavaDStream...
.mapToPair(t -> {
// Do some stuff with _field1 and _field2
})
.reduceByKey((b1, b2) -> {
// Do other stuff with _field1 and _field2
})
...
}
}
public static void main() {
MyClass myClass = new MyClass();
myClass.doSomething();
}
Within IntelliJ, everything works just fine. But after building and submitting the jar file with spark-submit
, it throws org.apache.spark.SparkException: Task not serializable
. The stack trace points to the lambda in mapToPair
.
My questions are: What's the difference between running within IDE and in stand-alone mode? How can I make it work properly?