1

I am trying to iterate over JavaRDD<Tuple2<String, Object>> and build a JSONArray with the data.

My code:

final JSONArray jA = new JSONArray();
final VoidFunction<Tuple2<String, Object>> func = new VoidFunction<Tuple2<String, Object>>() {

    @Override
    public void call(Tuple2<String, Object> arg0) throws Exception {
        JSONObject obj = new JSONObject();
        obj.put(columnName, arg0._1);
        obj.put("frequency", (String) arg0._2);
        jA.put(obj);
    }
};
outputRdd.foreach(func);

I am getting the following serialization error (removed complete trace for readability):

org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: org.json.JSONArray
Serialization stack:
- object not serializable (class: org.json.JSONArray, value: [])

Any pointers or workarounds?

Thanks :)

karthik manchala
  • 13,492
  • 1
  • 31
  • 55
  • Even if you didn't get an exception it wouldn't work. Each worker would get its own copy of `jA` and after `foreach` `jA` on a driver would be still empty. Regarding serialization errors check [this](http://stackoverflow.com/a/33042316/1560062). – zero323 Oct 13 '15 at 11:33
  • @zero323 So, there is no way to create a shared object and use it? – karthik manchala Oct 13 '15 at 11:37
  • Excluding accumulators and broadcasts which are not applicable here no. What exactly are you trying to achieve here? If you want to have a single local JSONArray then you have to collect. If each partition can be processed locally you can use `mapPartitions` – zero323 Oct 13 '15 at 11:41

0 Answers0