I want a page of filtered data from an Oracle database table, but I have a query that might return tens of millions of records, so it's not feasible to pull it all into memory. I need to filter records out in a way that cannot be done via SQL, and return back a page of records. In other words, the pagination part must be done after the filtering.
So, I attempted to use Hibernate's ScrollableResults
, thinking it would be a way to pull in only chunks at a time and iterate through them. So, I created it:
ScrollableResults results = query.setReadOnly(true)
.setFetchSize(500)
.setCacheable(false)
.scroll();
... and yet, it appears to pull everything into memory (2.5GB pulled in per query). I've seen another question and I've tried some of the suggestions, but most seem MySQL specific, and I'm using an Oracle 19 driver (e.g. Integer.MIN_VALUE
is rejected outright as a fetch size in the Oracle driver).
There was a suggestion to use a stateless session (I'm using the EntityManager
which has no stateless option), but my thought is that if we don't fetch many records (because we only want the first page of 200 filtered records), why would Hibernate have millions of records in memory anyway, even though we never scrolled over them?
It's clear to me that I don't understand how/why Hibernate pulls things into memory, or how to get it to stop doing so. Any suggestions on how to prevent it from doing so, given the constraints above?
Some things I'm going to try:
- Different scroll modes. Maybe insensitive or forward only prevents Hibernate's need to pull everything in?
- Clearing the session after we have our page. I'm closing the session (both using
close()
in theScrollableResults
and theEntityManager
), but maybe an explicitclear()
will help?