1

I have a requirement to group a list of object based on some member of the object. Let me show the requirement through an example what i want to achieve:

I have a UserStage class:

public class UserStage {

    private Integer userId;
    private Integer stageId;
    private LocalDateTime modifiedOn;

   //getter, setter and toString methods
}

Now let's say i have a List<UserStage> which can have multiple object for same userId. Let's call it userStageDataList.

I want to group this list by userId, so that i can get list of all records for a particular userId. This can be achieved by different approaches, but the way i tried to achieve this was as following:

Map<Integer, List<UserStage>> userWiseStageList = new HashMap<>();
        
for (UserStage userStage : userStageDataList) {
    userWiseStageList.computeIfAbsent(userStage.getUserId(), ArrayList::new).add(userStage);
}

But what surprised me here was that it was constantly taking around 3000ms to excute this for just 75 items in the list. And sometimes it throws java.lang.OutOfMemoryError: Java heap space.

When i replaced ArrayList::new with k -> new ArrayList<>() and now it was taking just 3 to 4 ms to create the grouping.

I tried groupingBy using streams and that also excuted well within 5-7ms for the same list.

I used following code to measure the execution time:

StopWatch watch = new StopWatch();
watch.start();
// here goes map creation code as shown above
logger.info("\n\n\n time taken for creating the map = {}",watch.getTime(TimeUnit.MILLISECONDS));

Am i missing something here? Any suggestions would be appreciated!

Thanks.

Sandeep
  • 19
  • 2

1 Answers1

4
  1. map.computeIfAbsent(k, func) will invoke func with argument k if map doesn't has key k.
  2. functionArrayList::new accepts an argument initialCapacity, which will be the initial size of the backing array of the List

So, in your case, if userId 1000 is absent in the map, the program will execute new ArrayList<>(1000) to create an ArrayList backed by an Object[] of size 1000. That is why it takes longer time and sometime consumes all VM memory.

kelgon
  • 391
  • 3
  • 6