0

The problem I have is that I have lots of similar queries in a batch:

select id, a, b, c from records where a = 1 and b = 2
select id, a, b, c from records where a = 5 and b = 3
select id, a, b, c from records where a = 2 and b = 5
select id, a, b, c from records where a = 3 and b = 4
...

a batch normally consists of 1000 queries, which takes about 0.5s to finish, with a and b indexed. Auto commit is on and all connections are managed in bonecp connection pool. Hibernate is used to map each result into an object instance.

My questions are:
1. Is it gonna make a difference if I combine them in a single query, even all db connections are live and the service doesn't need to wait for the connection to be established?
2. If yes, what's the best practice to do that in hibernate?
3. If not, is it gonna make a difference if I execute all the queries concurrently (instead of one by one in the current case), though the connection pool only allows a certain number of queries to be executed at the same time?

Vlad Mihalcea
  • 142,745
  • 71
  • 566
  • 911
NSF
  • 2,499
  • 6
  • 31
  • 55
  • I don't know about hibernate but can't you combine the conditions and make it a single query? – Rahul Jul 25 '14 at 03:32
  • Yes but I'm not quite sure about the performance merit compared to executing single queries concurrently given connections are managed. – NSF Jul 25 '14 at 03:37
  • If you are not sure then try and test it to be sure off. – Rahul Jul 25 '14 at 03:41

1 Answers1

1

Hibernate offers some strategies for optimizing fetch retrievals when loading associated entities (BATCH, JOIN, SUBSELECT) but for HQL/JPQL you can at best generate the query you'd have done it with a native query.

  1. You need to disable auto-commit. That's not playing well with Hibernate write-behind transactions and it hinders performance too.

  2. Since you are using MySQL you could try to create one IN query like this:

    select id, a, b, c 
    from records 
    where (a, b) IN ((1, 2), (5, 3), (2, 5), (3, 4))
    
  3. If you are not interested in fetching entities (for updating/removing them) and you only need a projection (id, a, b, c), then a native query is a better choice than a HQL one (assuming the MySQLDialect even supports this syntax).

  4. I wouldn't execute those concurrently. The CP lock contention is going to be your next bottleneck and for index-only-scans a single IN query will perform better than many small ones.

Community
  • 1
  • 1
Vlad Mihalcea
  • 142,745
  • 71
  • 566
  • 911
  • Sorry for the late reply and thanks for the nice answer. I eventually used the group by sub-query strategy as it turned out mysql's query optimizer becomes dumb when you use IN with a few columns - it doesn't use index at all unless you force it, whereas it still smartly uses index in the group by case. The performance was noticeably improved, though not hugely as other issues were involved. – NSF Aug 14 '14 at 20:15
  • When it comes to execution plans it's difficult to give the best solution up-front. Analysing the query plan is the best approach. I am glad you found the right query for your task. – Vlad Mihalcea Aug 14 '14 at 21:02