0

I have a code that looks like this.

List<Long> eventIds = ... // just arraylist of ids
Iterable<List<Long>> partitions = Iterables.partition(eventIds, 10); // eventIds are partitioned into smaller chunks so one call to database would require less resurces 
Map<Integer, YearlyStatistics> yearlyStatisticsMap = new HashMap<>();
for (List<Long> partition : partitions) {
    List<Event> events = database.getEvents(partition); // This is where I get OutOfMemoryException after couple of loops. It looks like previous list of events is never garbage collected.
    populateStatistics(events, yearlyStatisticsMap);
}

One Event is never larger than 1MB. JVM has 250MB of memory to work with. Reason for partitioning the eventIds is that I will definitely run out of memory if I try to fetch every object at once from the database. So I thought that I will ask data in smaller chunks and JVM will clear up memory after every loop after populateStatistics method is called. Looks like this is not the case as at around ~50th loop OutOfMemoryException is thrown. Is there any way to optimize this code so memory from previous events is freed up?

Virx
  • 105
  • 2
  • 11

1 Answers1

0

I hope this can help :

Instead of do a call to event from database for each partition in a loop, you can do an unique call to the database to retrieve all the events by a list of partitions, example :

List<Long> eventIds = ... // just arraylist of ids
Iterable<List<Long>> partitions = Iterables.partition(eventIds, 10); // eventIds are partitioned into smaller chunks so one call to database would require less resurces 
Map<Integer, YearlyStatistics> yearlyStatisticsMap = new HashMap<>();

// Use a IN with the list of partition on your query
// This structure contains the list of events per partition
Map<Long, List<Event> eventsWithPartitions = database.getEvents(partitions)

// Apply the rest of logic based on the MapeventsWithPartitions
// for eventsWithPartitions

The advantage with this solution, we are doing only one network call to the database.

According to this topic, Firstore supports IN in queries.

Mazlum Tosun
  • 5,761
  • 1
  • 9
  • 23
  • I don't think the database is the problem. I could easily fetch all Events from database but then I would definitely run out of memory instantly. That's why I process Events in chunks. The problem seems to be that previous chunks don't free up memory when they are discarder, or in this case overwritten. – Virx Nov 04 '22 at 12:34