Get Back Data From foreachPartition Java Spark

Asked Jan 21 '19 at 03:49

Active Jan 21 '19 at 14:55

Viewed 430 times

I am converting spark dataset into list of hash maps by using below approach

List<HashMap> finalJsonMap = new ArrayList<HashMap>();
srcData.foreachPartition(new ForeachPartitionFunction<Row>() {
    public void call(Iterator<Row> t) throws Exception {
        while (t.hasNext()){
            Row eachRow = t.next();
            HashMap rowMap = new HashMap();
            for(int j=0;j<grpdColNames.size();j++){
                rowMap.put(grpdColNames.get(j), eachRow.getString(j));  
            }
            finalJsonMap.add(rowMap);
        }
    }
});

The iteration is working fine but I am unable to add rowMap into finalJsonMap.

What is the best approach to do this?

edited Jan 21 '19 at 14:55

asked Jan 21 '19 at 03:49

Rama Krishna

Possible duplicate of [Scala spark, listbuffer is empty](https://stackoverflow.com/questions/40699432/scala-spark-listbuffer-is-empty) – 10465355 Jan 21 '19 at 10:52
Are you sure, you need foreachpartition method? Looks like usual groupBy+collect should be enough – Serge Harnyk Jan 21 '19 at 15:01
If I dataset.collectAsList() it is going to failed due toi memory issues – Rama Krishna Jan 21 '19 at 15:05
my end goal is to add "srcData" dataset to json array or hashmap – Rama Krishna Jan 21 '19 at 15:06
I am running this code on 3.2 million records – Rama Krishna Jan 21 '19 at 15:08
What are you trying to accomplish when you put data on json array or hashmap? If you want to put the data into hashmap or json array, you have to use collect to return elements of the dataset as an array at the driver program. Collecting huge data will definitely result to memory issues. I'd suggest writing this dataset to file first. – minyo Jan 22 '19 at 03:40

Get Back Data From foreachPartition Java Spark

0 Answers0