0

I am converting spark dataset into list of hash maps by using below approach

List<HashMap> finalJsonMap = new ArrayList<HashMap>();
srcData.foreachPartition(new ForeachPartitionFunction<Row>() {
    public void call(Iterator<Row> t) throws Exception {
        while (t.hasNext()){
            Row eachRow = t.next();
            HashMap rowMap = new HashMap();
            for(int j=0;j<grpdColNames.size();j++){
                rowMap.put(grpdColNames.get(j), eachRow.getString(j));  
            }
            finalJsonMap.add(rowMap);
        }
    }
});

The iteration is working fine but I am unable to add rowMap into finalJsonMap.

What is the best approach to do this?

Rama Krishna
  • 645
  • 10
  • 28
  • Possible duplicate of [Scala spark, listbuffer is empty](https://stackoverflow.com/questions/40699432/scala-spark-listbuffer-is-empty) – 10465355 Jan 21 '19 at 10:52
  • Are you sure, you need foreachpartition method? Looks like usual groupBy+collect should be enough – Serge Harnyk Jan 21 '19 at 15:01
  • If I dataset.collectAsList() it is going to failed due toi memory issues – Rama Krishna Jan 21 '19 at 15:05
  • my end goal is to add "srcData" dataset to json array or hashmap – Rama Krishna Jan 21 '19 at 15:06
  • I am running this code on 3.2 million records – Rama Krishna Jan 21 '19 at 15:08
  • What are you trying to accomplish when you put data on json array or hashmap? If you want to put the data into hashmap or json array, you have to use collect to return elements of the dataset as an array at the driver program. Collecting huge data will definitely result to memory issues. I'd suggest writing this dataset to file first. – minyo Jan 22 '19 at 03:40

0 Answers0