I have experienced "task not serializable" issue when running spark streaming. The reason can be found in this thread.
After I have tried couple methods and fixed the issue, I don't understand why it works.
public class StreamingNotWorking implements Serializable {
private SparkConf sparkConf;
private JavaStreamingContext jssc;
public StreamingNotWorking(parameter) {
sparkConf = new SparkConf();
this.jssc = createContext(parameter);
JavaDStream<String> messages = functionCreateStream(parameter);
messages.print();
}
public void run() {
this.jssc.start();
this.jssc.awaitTermination();
}
public class streamingNotWorkingDriver {
public static void main(String[] args) {
Streaming bieventsStreaming = new StreamingNotWorking(parameter);
bieventsStreaming.run();
}
Will give the same "Task not serializable" error.
However, if I modify the code to:
public class StreamingWorking implements Serializable {
private static SparkConf sparkConf;
private static JavaStreamingContext jssc;
public void createStream(parameter) {
sparkConf = new SparkConf();
this.jssc = createContext(parameter);
JavaDStream<String> messages = functionCreateStream(parameter);
messages.print();
run();
}
public void run() {
this.jssc.start();
this.jssc.awaitTermination();
}
public class streamingWorkingDriver {
public static void main(String[] args) {
Streaming bieventsStreaming = new StreamingWorking();
bieventsStreaming.createStream(parameter);
}
works perfectly fine.
I know one of the reasons is that sparkConf
and jssc
need to be static
. But I don't understand why.
Could anyone explain the difference?