Hibernate SQL In clause making CPU usage to 100%

Question

In my java application I am using SQL server and Hibernate3 with EJB. When I tried to execute a select query with In clause, the DB server CPU usage reaches to 100%. But when I tried to run the same query in SQL management studio, the query is running without any CPU spikes. Application server and DB server are two different machines. My table has the following schema,

CREATE TABLE student_table (
       Student_Id BIGINT NOT NULL IDENTITY
     , Class_Id BIGINT NOT NULL
     , Student_First_Name VARCHAR(100) NOT NULL
     , Student_Last_Name VARCHAR(100)
     , Roll_No VARCHAR(100) NOT NULL
     , PRIMARY KEY (Student_Id)
     , CONSTRAINT UK_StudentUnique_1 UNIQUE  (Class_Id, Roll_No)
);

The table contains around 1000k records. My query is

select Student_Id from student_table where Roll_No in ('A101','A102','A103',.....'A250');

In clause contains 250 values, When I tried to run above query in SQL management studio the result is retrieved within 1 seconds and without any CPU spikes. But when I tried to run the same query through hibernate the CPU spikes reaches to 100% for around 60 seconds and result is retrieved around 60 seconds. The hibernate query is,

Criteria studentCriteria = session.createCriteria(StudentTO.class);
studentCriteria.add(Restrictions.in("rollNo", rollNoLists)); //rollNoLists is an Arraylist contains 250 Strings
studentCriteria.setProjection(Projections.projectionList().add(Projections.property("studentId")));
List<Long> studentIds = new ArrayList<Long>();
List<Long> results = (ArrayList<Long>) studentCriteria.list();
if (results != null && results.size() > 0) {
   studentIds.addAll(results);
}
return studentIds;

What is the problem why it is so. If the same query is running through management studio the result is retrieved without any spikes and result is retrieved within 1 seconds. Any solution???

Edit1: My hibernate generated query is,

select this_.Student_Id as y0_ from student_table this_ where this_.Roll_No in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

Edit2: My execution plan This was after indexing roll_no

CREATE INDEX i_student_roll_no ON student_table (Roll_No)

My execution plan ,

Yes of course tried.. but no use... Same problem with 100% CPU usage spike — Jaya Ananthram, Apr 23 '15 at 07:02
may be this could help http://dba.stackexchange.com/questions/27698/hibernate-query-running-slow — Amogh, Apr 23 '15 at 08:23
show the mapping of `StudentTo` class. may be it is eagerly fetching associated objects. — Sindhoo Oad, May 04 '15 at 04:02
I had a similar problem... I resolved it by creating a temp table that had the original table(PK) and the search field(not a PK) as PK, and then ran the query against this temp table. A query that was taking 20-30s to execute under Hibernate (with Native SQL) started to ran instantly. So I believe you should give a try. — Dalton, May 04 '15 at 17:33
You need to show the mapping of ```StudentTo``` class as mentioned by @10sw33. Also mention which mssql java driver you are using for completeness (jtds or mssql driver or other)? — Shiraaz.M, May 04 '15 at 19:50

Vlad Mihalcea · Answer 1 · 2015-04-28T07:30:13.187

6

The query you run from the console is easily cacheable and that's why the response is instantaneous. If you look at the query, you'll see that all parameters are embedded in the query, so the query planner can detect there's no variation and all executions will always go to the same plan and to the same cached result.

The query that you run with Hibernate, even if it were a native query, it uses a PreparedStatement and parameters are bind at query execution time and to quote one of the best author on indexing:

What has that to do with bind parameters?

The shared execution plan caches of DB2, Oracle and SQL Server use a hash value of the literal SQL string as key to the cache. Cached plans are not found if the SQL contains literal values that vary with each execution.

Place holders (bind parameters) unify the statement so that the SQL string is identical when executed with different values—thus, increasing the cache-hit rate.

To solve it, you need to add an index on both the (Roll_No, Student_Id) columns so that the query becomes an index-only scan.

SQL Server defaults to cluster indexes, which limit you to one clustered index per table, so you might want to turn this table into a heap table instead and focus on index-only scans.

edited Apr 28 '15 at 07:30

answered Apr 28 '15 at 06:32

Vlad Mihalcea

142,745
71
566
911

2

I don't think the caching of the execution plan is an issue here. Parsers are pretty fast, and there aren't a lot of possible execution plan candidates for such a simple query. I'd rather say that with 250 bind variables, the optimiser might've chosen a poor execution plan rather than the somewhat more optimal one that can be chosen more easily with inline values. In Oracle lingo, this used to be referred to as "bind variable peeking issue" (which is no longer an issue in 11g). Don't know about SQL Server, though. – Lukas Eder Apr 28 '15 at 07:20
I also think that the DB doesn't uses the index, if there's even an index to be used. An index-only scan fits perfectly for this type of query, since it filters one column and it selects one more. – Vlad Mihalcea Apr 28 '15 at 07:25
It will certainly help in this case (although only when you use the `(Roll_No, Student_id)` column ordering in the index). – Lukas Eder Apr 28 '15 at 07:28
Exactly. I shall totally add this to the answer to avoid any confusion. – Vlad Mihalcea Apr 28 '15 at 07:29

Vladimir Baranov · Answer 2 · 2015-04-29T23:45:51.923

To answer the question "why it is slow through hibernate" you need to see the actual execution plan that is used by the server when you run your hibernate code, NOT the execution plan that the server uses when you run the query from SSMS. The screenshot with the execution plan that you included in the question doesn't look like the actual plan that you get when running your hibernate code. Once you have that plan you can compare it to the plan you get from SSMS and the difference will most likely explain why it is slow in one case and fast in another.

There is a very good article by Erland Sommarskog, which focuses on so-called "parameter sniffing", which could be the reason for a problem here, but not likely. What is useful for us in this article here is that he explains how you can extract execution plan for examination from the cache.

Without this information you can only guess. One guess is that you pass your parameters as nvarchar, but the indexed field Roll_No is varchar, so index is not used. The server converts your varchar column to nvarchar for the comparison, which means that index can't be used => it is slow and the conversion may be the reason for high CPU usage. http://sqlperformance.com/2013/04/t-sql-queries/implicit-conversion-costs https://www.sqlskills.com/blogs/jonathan/implicit-conversions-that-cause-index-scans/

Here is not an answer to your question, but a possible solution to the problem. Instead of passing 250 individual parameters to the query for the IN clause use table-valued parameter and pass your values as a table. In the query use JOIN instead of IN. This becomes especially true after your comments that you'll have 100K parameters (which means that you want to run your query 400 times). In fact, 100K is a bit too much even for a table-valued parameter, so I'd consider having a permanent or temporary helper table, which will hold these Roll_No with proper index. The main query would JOIN to it. Something like this:

CREATE TABLE RollNumbers (
     Roll_No VARCHAR(100) NOT NULL
     ,PRIMARY KEY (Roll_No)
);

Make sure there is index in table RollNumbers on Roll_No. Make sure there is index in table student_table on Roll_No. At first INSERT 100K values into RollNumbers and then use them in the main query:

SELECT Student_Id 
FROM
    student_table
    INNER JOIN RollNumbers ON RollNumbers.Roll_No = student_table.Roll_No

Depending on the overall system the RollNumbers table could be a permanent table, temporary table or table variable.

score 4 · Answer 3 · answered Apr 28 '15 at 06:31

Check the datatype at hibernate level of all the fields used in the query and ensure it matches with you table definition. Frameworks like hibernate uses Unicode supported data types(e.g. nvarchar). Try to change the data type at either of the side.

Alternatively, you can Add parameter called sendStringParametersAsUnicode in your connection string. It will force hibernate to use varchar instead nvarchar.

Just give it a try and let us know!

score 4 · Answer 4 · edited May 23 '17 at 12:02

You're probably thinking that, because your slow query takes 60 seconds, your "fast" query taking 1 second is actually fast. This is not the case. This execution speed difference keeps you from understanding the actual problem here.

An additional problem (probably not the actual problem)

The very simple type of query that you're running should return results within less than a millisecond if you had an index on Roll_No, regardless if you're using bind variables or inline values.

I'm just assuming that you don't have any indexes apart the ones that are generated from the constraints in your table. So, either, you should add a simple index on Roll_No:

CREATE INDEX i_student_roll_no ON student_table (Roll_No);

Or you could add an additional column to the above index in order to make it a "covering index" for this query (as explained by Vlad)

CREATE INDEX i_student_roll_no2 ON student_table (Roll_No, Student_Id);

This would make this particular query even faster, as the execution plan wouldn't need to hit the disk again to fetch the Student_Id from the table. The information would already be contained in the index. Use covering indexes sparingly, though, as they:

Use up more space, specifically for a medium size table like yours
Work only well as long as your queries are limited to exactly the columns that are really covered, which is unlikely to stay this way in your case.

How to recognise this using SQL Server Management Studio?

There's actually a very nice feature in SQL Server Management Studio. When you turn on execution plans (which you should), you'll get this additional info about your query:

SQL Server Management Studio execution plans

Right click on that info and choose "Missing Index Details..." to get information similar to this one:

/*
Missing Index Details from SQLQuery1.sql - 
  LUKAS-ENVY\SQLEXPRESS.test (LUKAS-ENVY\Lukas (52))
The Query Processor estimates that implementing the 
  following index could improve the query cost by 87.5035%.
*/

/*
USE [test]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[student_table] ([Roll_No])
INCLUDE ([Student_Id])
GO
*/

Why the difference between SQL Server Mgmt Studio and Hibernate?

Your original question has not yet been answered. Perhaps, the additional index fixes it, but perhaps you didn't supply all the information. You could have:

Bind variable peeking issues
N+1 problems in Hibernate (with that many rows)

I tried CREATE INDEX i_student_roll_no ON student_table (Roll_No) but no use. I have update my execution plan in question — Jaya Ananthram, Apr 28 '15 at 08:11
@JayaAnanthram: What do you mean by "no use"? How did it affect both query execution speeds? Also: Are you sure that Hibernate doesn't also execute other queries? — Lukas Eder, Apr 28 '15 at 08:21
Still CPU usage is 100%. And I am sure no other queries is executing when the above query is executed. And one more point i will be having around 100k roll numbers, from that i will frame query by passing first 250 roll numbers and then next 250 roll nos and so on... — Jaya Ananthram, Apr 28 '15 at 09:15
Hmm, I'm not sure if I understand your 250-steps. I suspect there's still some info missing in your question, then. The index bit will certainly be useful nonetheless, but it doesn't answer your actual question — Lukas Eder, Apr 28 '15 at 09:18
I added all the information. The only thing is just I renamed variable name and table name. — Jaya Ananthram, Apr 28 '15 at 09:36
If searching a non indexed field with IN clause will leads to more CPU usage(100%)? It will affect performance only right? If non indexed field searches 10ms then indexed field should search < 10ms. But whether CPU usage will be concern for a indexed and non indexed field? — Jaya Ananthram, Apr 29 '15 at 05:22
I'm actually not sure what the reasons for this behaviour are. I suspect that your table (or the unique index) has been loaded into some buffer cache and then is scanned in memory, which generates CPU load. But I don't see why there should be a difference between your SQL Server Management Studio usage, and the Hibernate verions — Lukas Eder, Apr 29 '15 at 06:15

Azuaron · Answer 5 · 2015-05-05T15:49:05.600

By a strange coincidence, I ran into this problem just this week, and it's not the Hibernate N+1 problem many people here are referring to. I use Amazon Redshift, with a Postgres Java driver.

For reasons I won't go into here, I was using a 7,000 parameter long WHERE IN () statement which, when run against the database, results in a 10-second query execution (it's a complex query on a large table, and we haven't setup indexing yet--still in development, we haven't started tuning). When run through Hibernate, there's a 120-second query execution time.

I discovered that if you get the actual SQL string out of Hibernate, replace (?,?...?) with a String of the actual values, and run that (still through Hibernate), suddenly everything comes back in 10 seconds again.

Digging around in the Hibernate internals, it turns out that they do a non-trivial amount of processing per parameter, resulting in an initial CPU spike and bloated execution times for statements with a large number of parameters.

In addition, once the query's eventually sent to the database, the database server's CPU spikes to 100% for the duration when using parameters, but not without parameters. I haven't checked exact timings for how much of all this processing is happening on which side of the fence, but it looks like using that many parameters is not viable either on the Hibernate side or the database side.

The solution? Use fewer parameters. Or find a database that supports large parameter sets while remaining performant.

We're probably going to switch from Hibernate to jOOQ, since jOOQ allows you to define your own custom SQL fragments that work with the official DSL. Then, we'll build the IN() clause manually without parameters. We can do this since our IN variables are internal IDs, so SQL injection isn't a possibility, but if SQL injection is a possibility, make sure to sanitize your inputs.

Just curious, are you using MS SQL Server? As far as I know the [maximum number of parameters in SQL Server is 2100](https://msdn.microsoft.com/en-us/library/ms143432.aspx), so it is interesting to know how hibernate manages to pass 7000 parameters. — Vladimir Baranov, May 03 '15 at 09:16
In your case, is there any difference in CPU usage when you tried to ran through hibernate and against the database (SQL server management studio)?? — Jaya Ananthram, May 04 '15 at 04:39
Amazon Redshift, actually. I don't know if there's a limit, but we haven't hit it yet. I think one of my coworkers tried with 100,000 values and it still worked. I haven't looked at the CPU usage, but I can check when I get in today. — Azuaron, May 04 '15 at 12:22
CPU spike means? Consistently 100% throughout the query time? Or some up and downs between X% and Y%? — Jaya Ananthram, May 05 '15 at 04:20
For the CPU spike, it was 100% during the start of the query, then settled down to around 60% for the duration. But, turns out there was more going on than I initially thought; I'm updating my answer with the new information (won't fit in a comment). — Azuaron, May 05 '15 at 15:30

score 1 · Answer 6 · answered Apr 28 '15 at 11:21

1

It is look like you got not all records when execute query not from hibernate, but in your code all records from query execution placed into collection.

answered Apr 28 '15 at 11:21

Alexander Fedyukov

964
8
18

rollNoLists contains only 250 values, for next call it will contains next 250 values and so on. – Jaya Ananthram Apr 28 '15 at 11:46
Hi Jaya, Are you saying that your result set of the query is 250 records? Alexander makes the point that visual tools like SSMS will limit the numbers of rows fetched from the result set to a small number. For example when your result set is 10,000 records your UI might show you only 250 records. Which cuts down on the execution time since the other 9,750 records don't need to be pushed over to the client application. Hibernate on the other hand loads the whole result set. – Peter Schuetze May 06 '15 at 15:07
@PeterSchuetze, while a valid comment, I've never seen that SSMS doesn't display all rows. As opposed to SQLyog, for example. – Vladimir Baranov May 06 '15 at 23:35

score 1 · Answer 7 · answered May 05 '15 at 03:28

1

It's very easy to determine the issue by running SQL Profiler. You will see exactly what SQL statements are being executed against the database in both scenarios.

http://www.codeproject.com/Articles/21371/SQL-Server-Profiler-Step-by-Step

answered May 05 '15 at 03:28

itsben

1,017
1
6
11

Just be aware that running SQL Profiler would affect the performance of the whole server: http://dba.stackexchange.com/questions/818/using-sql-profiler-on-a-database-thats-in-production http://dba.stackexchange.com/questions/17474/does-sql-profiler-affect-server-performance – Vladimir Baranov May 05 '15 at 05:24
To be clear, SQL Profiler will display each SQL statement that is being executed on the server. You will need to filter those results to find the statements that you are interested in. I think it goes without saying that SQL Profiler will have some impact on the server while it is running. It is the equivalent of enabling logging for an application. With that in mind, you should use it for development, debugging and tuning. Then just close or exit the SQL Profiler application when you are done. – itsben May 05 '15 at 23:20

score 1 · Answer 8 · edited May 23 '17 at 12:17

I just point this part of LBushkin answer for you from this post

Second, when using either IN or OR with a variable number of arguments, you are causing the database to have to re-parse the query and rebuild an execution plan each time the arguments change. Building the execution plan for a query can be an expensive step. Most databases cache the execution plans for the queries they run using the EXACT query text as a key. If you execute a similar query but with different argument values in the predicate - you will most likely cause the database to spend a significant amount of time parsing and building execution plans. This is why bind variables are strongly recommended as a way to ensure optimal query performance.

So you can try binding variables to prevent running the execution plan each time

Bind Variables Usage (Parameterized Queries in SQL Server)

If it is a solution whether I need to try in this way? `select Student_Id from student_table where Roll_No in (:v1, :v2, :v3, :v4, :v5, ...... ,:v250)` and binding 250 values?? — Jaya Ananthram, May 05 '15 at 05:02

Hibernate SQL In clause making CPU usage to 100%

8 Answers8

An additional problem (probably not the actual problem)

How to recognise this using SQL Server Management Studio?

Why the difference between SQL Server Mgmt Studio and Hibernate?