0

I try to fetch some data from mongodb , but my k8s pods hitting:

Terminating due to java.lang.OutOfMemoryError: Java heap space

Checking the heap dump this seems is causing some trouble:

try (CloseableIterator<A> iter = 
         mongoTemplate.stream(query(criteria),
                              DocumentAnnotation.class,
                              ANNOTATIONS_COLLECTION_NAME)) {
    return StreamSupport.stream(
        Spliterators.spliteratorUnknownSize(iter, Spliterator.ORDERED), false)
                        .filter(annotation -> isAnnotationAcceptedByFilter(annotation))
                        .collect(Collectors.toList());
}

In general, it creates an iterator using Mongo driver streaming API and iterates through all annotations returned by a database using given criteria. It seems that Mongo DB driver is reading annotations in bulks of 47427 items (? at least I see that in heap dump) and despite of the fact that most will be filtered by the filter in Java so not returned to the client, that is causing a problem because each such request allocates 100MB of RAM to keep this bulk.

Does anybody know if that bulk size is configurable?

Thanks

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
R2D2
  • 9,410
  • 2
  • 12
  • 28
  • I think you may have misdiagnosed this. A block size of that size shouldn't be problematic. I suspect that the real problem is either your filter is NOT filtering out most of the items (so the resulting list is too big) OR there is a memory leak somewhere else. – Stephen C Jun 18 '21 at 00:39
  • But this Q&A is about setting the batch size: https://stackoverflow.com/questions/48072977 – Stephen C Jun 18 '21 at 00:42
  • It seems this partially explain the issue: https://stackoverflow.com/questions/15516462/is-there-a-performance-impact-when-calling-tolist , but still not clear how to fix it ... – R2D2 Jun 18 '21 at 22:33
  • I don't see how it is relevant. – Stephen C Jun 19 '21 at 01:50
  • ToList() allocate heap memory for all the collection, collection is 500MB , my pod has only 3GB so apparently with few more such requests heap is full... – R2D2 Jun 19 '21 at 05:09
  • "ToList() allocate heap memory for all the collection" - I don't think so. I think it only allocates a list proportional to the number of **filtered** elements of the stream. – Stephen C Jun 19 '21 at 05:19
  • exactly, when the filtered are 500MB it will allocate 500MB... – R2D2 Jun 19 '21 at 05:23
  • 1
    Well ... then ... that is your problem. If the filtered list requires 500MB to store, then you need that much memory. Or you need to change your application design / logic so that you don't need to create the list at all. (This is nothing to do with the batch sizes used by the driver.) – Stephen C Jun 19 '21 at 05:26
  • For reference ... this is what I said in my first comment. *"I suspect that the real problem is [...] your filter is NOT filtering out most of the items (so the resulting list is too big)"* – Stephen C Jun 19 '21 at 05:35

1 Answers1

1

Based on what you have said in the comments, my opinion is that what you have misdiagnosed the problem. The batch size (or "bulk size" as you called it) is not the problem, and changing the internal batch size for the Mongo driver won't fix the problem. The real problem is that even after filtering it the list you are creating using the stream is too large for the Java heap size that you are using.

There are two possible approaches to solving this:

  • Instead of putting the annotations into a List, iterate the stream and process the annotations as you get them.

  • Figure out a way to extract the annotations in batches. Then get a separate list of the annotations in each batch.

(In other circumstances, I would suggest trying to do the filtering in the MongoDB query itself. But that won't help to solve your OOME problem.)

But if you need all of the annotations in memory at the same time in order to process them, then your only practical option will be to get more memory.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216