5

Hibernate Pagination Issue

I have an issue which is related to Hibernate Pagination and to some extent this has been explained in

Mysql Pagination Optimization

Using Hibernate's ScrollableResults to slowly read 90 million records

Hibernate - HQL pagination

Issues with Pagination and Sorting

Hibernate Row Pagination

Details

HQL Query from Application:

Query q = session.createQuery("from RequestDao r order by r.id desc");
            q.setFirstResult(0);
            q.setMaxResults(50);

Query returns 3 million records and for pagination we are setting only 50 of those records, pagination page is very slow because on every refresh we are calling the query which get 3 millions records and out of those we only set 50 records.

My main question is

Does HQL always goes and hits database or does it go and hit session or memory to look for the data and if it goes everytime to hit database and get resultset then it is very proper from performance point of view, what would be best solutions to improve it?

Using HQL in hibernate is there a way we can query database and get only 50 records out first and then get other records as required by the user. This challenge is really bogging down application and so what would be best way to solve this problem?

HQL Query generated in logs

from com.delta.dao.RequestDao r order by r.id desc

Hibernate Generated Query

select
    getrequest0_.ID as ID24_,
    getrequest0_.TIME as START3_24_,
    getrequest0_.STAT as STATUS24_,
    getrequest0_.SUM as SUMMARY24_,
    getrequest0_.OUTNAME as OUTPUT7_24_,
    getrequest0_.INPNAME as INPUT8_24_,
    getrequest0_.REQUEST_DATE as requestT9_24_,
    getrequest0_.PARENT_ID as PARENT10_24_,
    getrequest0_.INTER_TYPE as INTERPO60_24_,
    getrequest0_.OPEN_INT as OPEN61_24_,
    getrequest0_.SOURCE_TYPE as SOURCE62_24_,
    getrequest0_.TARGET_TYPE as TARGET20_24_,
    getrequest0_.SOURCE as SOURCE14_24_,
    getrequest0_.COPY_DATA as COPY16_24_,
    getrequest0_.CURVE as GENERATE63_24_,
    getrequest0_.TITLE as TITLE24_,
    getrequest0_.TIME_ID as TIMESERIES12_24_,
    getrequest0_.TASK_NAME as TASK51_24_ 
from
    REQUEST getrequest0_ 
where
    getrequest0_.KIND='csv' 
order by
    getrequest0_.ID desc

Here is the Explain Plan for the query:


 | id | select_type | table        | type | possible_keys  | key        | key_len | ref          |  rows    | filtered | Extra       | 
 |  1 | SIMPLE      | getrequest0_ | ref  | TR_KIND_ID     | TR_KIND_ID | 6       | const        | 1703018  |   100.00 | Using where |

Additional information: Query run time with and without order by clause on 50 records limit


If i run query with order clause then query takes 0.0012s with setting LIMIT 50 and without order clause, same query takes 0.0032s with same LIMIT 50.


Also how can we find if:

  1. Particular HQL Query is hitting database and not cache or getting information from session?
  2. Is it true that HQL Query will always go and hit database to get result out and Criteria would go and hit session or cache and get results from it?
  3. Also in my below mentioned query:

    a) Query q = session.createQuery("from RequestDao r order by r.id desc");
    b) q.setFirstResult(0);
    c) q.setMaxResults(50);
    

at a, is it true that we get result from database and store it in memory or where if not and at this time we have 3 million results in result set and then at b and c we set offset value and limit so on page we would only see 50 results so now where are remaining 3 million records and on our second call to this query do we again go and hit database and get 3 million records and put them in memory and then at c again we set 50 records and go on an on.

This issue is not clear to me and so would highly appreciate if someone can provide clear and detailed explanation as how this is working and what would be best solution for this problem.

Update

As it turns out, issue am having is not related to display of records on the page but i have filter on that page and on every request am getting all drop down values again from database and there are some funky things going on in there that is causing rise in page load time.

I am making multiple nested hibernate queries to database and getting results back, what would be an optimal solution for this problem?

Community
  • 1
  • 1
Rachel
  • 100,387
  • 116
  • 269
  • 365

2 Answers2

5

Your query tells database to sort all records that satisfy the WHERE clause. It potentially may sort millions of records before returning you top 50.

EDIT 1/26: Now that the specific questions were clarified, I'll try to respond more specifically.

  1. Every time you execute query like that, Hibernate goes to the database. Even more, it would flush all new/updated data in the session to disk. If this is your situation, this behavior might contribute to the slowness.

  2. Using Hibernate Query API usually performs quite well in most situations and is compatible with a broad variety of the database platforms. If you are really concerned about squeezing last drop of performance out of your data access layer, you can write your own native SQL query to select top 50 results. But as soon as you do that, you'll almost certainly loose database independence. So, evaluate your costs vs. benefits.

Your query run times appear to be in the single milliseconds range. This is usually as good as it gets with the relational databases that store data on disk. So you might want to evaluate whether you indeed have a performance problem.

EDIT 1/27: OK. It looks like the problem in the overall design of the page. I had been using AJAX for last 7 years or so, so I don't usually have to wait for the filtering UI controls to redraw when going through pages of a table. I guess, switching application UI frameworks is not an option in your case. You have to figure out how to optimize loading of the data for the dropdowns and such. Does this data change frequently? Can you cache it somewhere in the application? If you have to load them every time, can you get away with just getting the display strings instead of entire objects?

Olaf
  • 6,249
  • 1
  • 19
  • 37
  • and that could be time consuming and memory intensive process? – Rachel Jan 25 '12 at 21:08
  • what would be other way to optimize the query to make sure that performance is improved? – Rachel Jan 25 '12 at 21:09
  • also when we say `before returning you top 50`, after the query fetches result from database, does it keep those 3 million records in memory and does sorting in memory or where exactly is the data? – Rachel Jan 25 '12 at 21:10
  • @Rachel: The sorting is done on the database server. Database server uses both memory and hard drive space. An index on REQUEST.ID should really speed things up. – Olaf Jan 26 '12 at 13:27
  • We already have index on request.id and so i do not see if that is the issue. – Rachel Jan 26 '12 at 14:16
  • on your first comment regarding data getting flushed to session on disk then should not 2nd time when we run that query then it should get query results from session rather then getting it from database? – Rachel Jan 26 '12 at 19:16
  • @Rachel: I'm pretty sure that Hibernate would go after the session cache and bypass database hit only if you are getting object by it's primary key. If you do a relatively complex query, it will just go to the database. In any case, you can enable Hibernate logging and see which SQL statements it executes and when. – Olaf Jan 26 '12 at 20:27
  • I have enabled Hibernate logging and mentioned the query executed in question itself, how can one know from query itself that page is hitting database or getting resultset from query. – Rachel Jan 27 '12 at 14:48
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/7100/discussion-between-olaf-and-rachel) – Olaf Jan 27 '12 at 14:56
  • oh ok, i will add my findings in comment there so that you can refer them There is indeed many other queries that are run but am not sure how N+1 query problem can be detected or solved. – Rachel Jan 27 '12 at 20:20
  • Dropdown data does not change frequently, it is static data, right now we are not caching it at some place, we will have to cache it to make sure that we are not querying again database for same results, i didn't got last point regarding database string but we have to fix this issue for sure. – Rachel Jan 28 '12 at 15:03
  • what would be best solution in my case, `query cache` or `second level cache`, i am new to hibernate caching and so what would you advise? – Rachel Jan 30 '12 at 16:25
  • If your filter criteria is static, I would just cache it in regular Java collections during initialization of the service responsible for the data that is shown on that page. – Olaf Jan 30 '12 at 16:32
  • Can you provide an simple example of doing it. – Rachel Jan 30 '12 at 19:44
1

If you set the first result and the maximum number of results Hibernate will only retrieve those entries from the database. If that is slow, set max results to 1 and enable SQL logging to see which associations are loaded as well.

Hibernate also supports several caches: * first level cache: will only be used during one Hibernate session which often corresponds to one transaction * second level cache: used over session/transaction boundaries, but mostly only by find by id methods * query cache: if enabled Hibernate will lookup the query results from there first. However, the query must be the same in terms of HQL and parameters so each page might be loaded from the DB once and then be cached. (For more information have a look here: http://docs.jboss.org/hibernate/core/3.5/reference/en/html/performance.html#performance-querycache)

Please note that caching requires heap memory and depending on the size of your entities caching 3 million entities might result in a huge memory hit and thus increased garbage collection which would hit performance again.

Olaf
  • 6,249
  • 1
  • 19
  • 37
Thomas
  • 87,414
  • 12
  • 119
  • 157
  • Hmm...so what would be an appropriate approach to solve this challenge? – Rachel Jan 25 '12 at 20:34
  • if that is correct than hibernate is only getting first 50 results from database but i wonder why my application is taking hell lot of time, i have tried logging sql and that's the hibernate sql generated, not sure what should be next steps to solve this mystery. – Rachel Jan 25 '12 at 20:35
  • @Rachel did you try to execute the SQL that is generated by Hibernate? Does it take long? If so, do you have an index `REQUEST.KIND` and maybe on `REQUEST.ID` too? – Thomas Jan 25 '12 at 22:33
  • I have tried that query is taking approx `1 min` but when i am running that query from command line then it take time for result to get popped out. – Rachel Jan 25 '12 at 22:56
  • @Thoms: I have index on `KIND + ID` – Rachel Jan 25 '12 at 22:57
  • @Rachel: Did you run explain plan on the query? Are you positive the database is using your index for this query? What database platform is this, by the way? – Olaf Jan 26 '12 at 13:41
  • I have updated explain plan results in the question, also database in reference is MySQL. – Rachel Jan 26 '12 at 14:27
  • @Rachel try to execute the SQL query with and without the order statement (you might also want to manually add a `LIMIT 50`) and post the timings. If the query without the order it much faster then the ordering is an issue and you might not have set up the index correctly (in that case you might want to give some information on how you set the indices). – Thomas Jan 26 '12 at 14:42
  • dumb question: how do i record timings – Rachel Jan 26 '12 at 14:53
  • also i cannot run this query from mysql browser as it hangs up on memory limitations and if i run this query on command line then system is popping out 3 millions of records which takes hell lot of time, not sure how to deal with it. – Rachel Jan 26 '12 at 14:55
  • if i do `with order` clause and `limit by 50`, query takes `0.0012s` and if i do it `without order` clause and `limit by 50`, query takes `0.0032s` – Rachel Jan 26 '12 at 15:13