0

I want a page of filtered data from an Oracle database table, but I have a query that might return tens of millions of records, so it's not feasible to pull it all into memory. I need to filter records out in a way that cannot be done via SQL, and return back a page of records. In other words, the pagination part must be done after the filtering.

So, I attempted to use Hibernate's ScrollableResults, thinking it would be a way to pull in only chunks at a time and iterate through them. So, I created it:

ScrollableResults results = query.setReadOnly(true)
    .setFetchSize(500)
    .setCacheable(false)
    .scroll();

... and yet, it appears to pull everything into memory (2.5GB pulled in per query). I've seen another question and I've tried some of the suggestions, but most seem MySQL specific, and I'm using an Oracle 19 driver (e.g. Integer.MIN_VALUE is rejected outright as a fetch size in the Oracle driver).

There was a suggestion to use a stateless session (I'm using the EntityManager which has no stateless option), but my thought is that if we don't fetch many records (because we only want the first page of 200 filtered records), why would Hibernate have millions of records in memory anyway, even though we never scrolled over them?

It's clear to me that I don't understand how/why Hibernate pulls things into memory, or how to get it to stop doing so. Any suggestions on how to prevent it from doing so, given the constraints above?

Some things I'm going to try:

  • Different scroll modes. Maybe insensitive or forward only prevents Hibernate's need to pull everything in?
  • Clearing the session after we have our page. I'm closing the session (both using close() in the ScrollableResults and the EntityManager), but maybe an explicit clear() will help?
Depressio
  • 1,329
  • 2
  • 20
  • 39
  • These are more dependent on the database than Hibernate, so you'll want to search Oracle specific questions. It's tricky to get fetchsize to work properly with Postgres too. – Kayaman Sep 17 '20 at 16:29
  • If it's a 'heavy' query that is resource/time consuming then grabbing everything into memory might not be so bad. If however you must repeat the query for each page with Oracle one of the common ways is to use `ROW_NUMBER` in the query to number each row according to a grouping criteria.data, then filter on the values for ROW_NUMBER being BETWEEN the lower and upper record numbers corresponding to your page. The downside here is if the underlying data changes to affect the results you might find you 'skip' or 'repeat' rows. – TenG Sep 17 '20 at 16:40
  • I debugged all the way to the JDBC call. The `ResultSet` the Oracle driver sends back is appropriately scrollable, and has some flags in it that indicate whether it's fetched everything, and how many records it has fetched. It's clear in the debugger the driver has done its job. – Depressio Sep 17 '20 at 18:41

2 Answers2

0

We were scrolling through the entire ScrollableResults to get the total count. This caused two things:

  1. The Hibernate session cached entities.
  2. The ResultSet in the driver kept rows that it has scrolled past.

Fixing this is specific to my case, really, but I did two things:

  1. As we scroll, periodically clear the Hibernate session. Since we use the EntityManager, I had to do entityManager.unwrap(Session.class).clear(). Not sure if entityManager.clear() would do the job or not.
  2. Make the ScrollableResults forward-only so the Oracle driver doesn't have to keep records in memory as it scrolls. This was as simple as doing .scroll(ScrollMode.FORWARD_ONLY). Only possible since we're only moving forward, though.

This allowed us to maintain a smaller memory footprint, even while scrolling through literally every single record (tens of millions).

Depressio
  • 1,329
  • 2
  • 20
  • 39
-1

Why would you scroll through all results just to get the count? Why not just execute a count query?

Christian Beikov
  • 15,141
  • 2
  • 32
  • 58
  • The count must not include records that get filtered out. That filtering cannot happen in SQL. – Depressio Sep 18 '20 at 20:28
  • What kind of filtering are you doing that can't happen in SQL? You could alternatively do the scrolling by pagination through the primary key ordering i.e. use keyset pagination. – Christian Beikov Sep 20 '20 at 10:01