I have the following Java Spark code:
stream.foreachRDD(rdd -> {
//do some operations
List<String> jsonList = new ArrayList<String>();
rdd.foreach(msg -> {//Kafka messages
jsonList.Add(msg.value());
});
writeJsons(jsonList);//jsonList size is 0
}
I want to iterate for each message, add the message to my List and do some logic with my Json list.
I'm very new in Spark and I'm trying to understand why after the rdd.foreach
loop the jsonList size is 0. How does Spark share the List between the nodes?
What should I change in my code if I want to add all Json meesages to list and then do with the json list my logic?