1

I have a JPAQL-2-mySQL query that stops running after 5 minutes by timeout:

select 
      question, 
      ans_items, 
      count(distinct ans_items.id) as voteCount, 
      (select 
            count(distinct ans.id) 
         from 
            answers ans 
               join ans.session ses  
         where 
                ses.survey.id = :surveyId 
            and ses.state = :sessionState 
            and ans.question.id = question.id 
         group by 
            question.id) as total, 
      max(ranks.rank) as rmax 
   from 
      Session as ss 
         join ss.answers as answers 
            join answers.items as ans_items 
            join answers.question as question 
            left join question.ranks as ranks 
   where 
          ss.survey.id = :surveyId 
      and ss.state = :sessionState 
   group by 
      question.id, 
      ans_items.variant.id, 
      ans_items.variantWeight  
   order by 
      question.orderNumber asc

If I replace COUNTs with constant values:

 select question, ans_items, 150, 
 (select 10 from answers ans join ans.session ses
 where ses.survey.id=:surveyId and ses.state=:sessionState 
 and ans.question.id=question.id group by question.id) as total, 
 max(ranks.rank) as rmax 
 from Session as ss 
 join ss.answers as answers 
 join answers.items as ans_items 
 join answers.question as question 
 left join question.ranks as ranks 
 where ss.survey.id=:surveyId and ss.state=:sessionState 
 group by question.id, ans_items.variant.id, ans_items.variantWeight  
 order by question.orderNumber asc

, it starts being executed in 2 seconds. Which is 150+ times faster. It seems like a couple of COUNTs slow down the query a lot! Is there a way to optimize the query or I absolutely have to calculate counts not using the DB (I can do this, but I'd like to avoid such kind of rewriting).

DRapp
  • 47,638
  • 12
  • 72
  • 142
Kat
  • 23
  • 3
  • have you looked at the EXPLAIN output for each query? – Dan Feb 19 '15 at 23:01
  • 1
    `JOIN` usually comes with `ON`... – Brewal Feb 19 '15 at 23:01
  • 1
    @Brewal It's JPAQL-to-mySQL, `ON` is inserted automatically... Does manual inserting affect performance? – Kat Feb 19 '15 at 23:03
  • My bad... I don't think so then. Can we see the actual SQL generated ? – Brewal Feb 19 '15 at 23:04
  • @dan08, Brewal I'm trying to understand how to look at the generated queries and their EXPLAINs from the IDEA IDE. Thanks for direction! – Kat Feb 19 '15 at 23:08
  • 1
    I guess, the problem is not from multiple counts, but from the subselect in the columns part. This does a select for every result row and might be the reason for slowing down the query. You could verify this by leaving the simple `count(...) as voteCount` and replace only the subselect with a constant value. – Olaf Dietsche Feb 20 '15 at 09:56
  • @OlafDietsche I'm pretty sure the subquery is the problem. – noober Feb 20 '15 at 11:04

4 Answers4

2

Looks OKis so I am afraid, you need to debug a bit SQL. You need to see what is the actual query which is send by JPA, then use Explain on it in DB to see what is the execution plan (if you get full table scan or something). This kind of queries should be done using only indexes so they woudl be fast (if you have the right indexes of course). SO once you see the execution plan you can figure out what indexes are missing.

To get the query output, change log level in you persitance, for example in eclipselink:

 <property name="eclipselink.logging.level" value="FINE"/>

Then guard you calling of the count method with many \n\n to easily spot it.

It will have a form of prepared statement, like SELECT COUNT(id)... where Sesion.ID = ?, and below another line will have the actual values used.

Replace the ? with the values and you have a proper SQL. Conect to your DB, and check exceution plan, in mysql you call EXPLAIN SELECT COUNT....

If it is your first time you will nedd to check documentaiton how to understand output, but bottom line on each output row you should see index being used if not you are in trouble.

Zielu
  • 8,312
  • 4
  • 28
  • 41
1

Is ans_items.id indexed? If not, create an index on it. If it is part of an index, that doesn't help, unless the first column in the index is ans_items.id.

Federico Razzoli
  • 4,901
  • 1
  • 19
  • 21
1

The reason you are getting hosed is you have no join ON clause to identify the relationships between the tables. So, for each one record in one table it is joining to every in the other and that result joined again to the next.

Please update your query to identify the related "IDs". And qualify all your columns with the alias name... where are the fields coming from per the aliases.

Ex: with your field query for distinct ans.id (good for alias),

 from 
    answers ans 
       join ans.session ses  

what is the relationship between answers and sessions.

        answers ans 
           join ans.session ses  
              on ans.sessionID = ses.sessionID

Likewise in your main query... aside from your where clause for a specific survey and state, how is session (ss) related to answers and its relation to ans_items and ans items to questions. Fix those and you'll get a better result.

   from 
      Session as ss 
         join ss.answers as answers 
            join answers.items as ans_items 
            join answers.question as question 
            left join question.ranks as ranks 

Once those are known, offering additional indexes to optimize the query could be presented to further help.

DRapp
  • 47,638
  • 12
  • 72
  • 142
1

The compound INDEX(survey_id, state) should help the speed. (Or those two could be in the opposite order).

If you are using InnoDB, check that innodb_buffer_pool_size is about 70% of available RAM.

But perhaps @Brewal nailed it -- You have no ON clause for the JOINs to answers or items or question or ranks. So you are unnecessarily creating a huge tmp table with all combinations of those tables. Remove the GROUP BY to see it. If JPAQL is guilty, get rid of it!

Rick James
  • 135,179
  • 13
  • 127
  • 222