10

I am dealing with what is apparently a performance issue while retrieving a relatively large ResultSet from a remote Microsoft SQL Server 2012 to a Java client that uses Microsoft JDBC Driver 4.0.

When I run the corresponding query on the remote server's Microsoft SQL Server Management Studio, it returns approx. 220k rows almost instantaneously. When I issue the same query from the client, it stalls. The same test has worked fine also on the client with an earlier version of the database where only approx. 400 rows qualified.

I tried to tackle this by appending ;responseBuffering=adaptive" to the URL passed to DriverManager.getConnection(). After the connection is established, I see this property (among several others) in the result from connection.getMetaData().getURL(), but[ connection.getClientInfo(responseBuffering) returns null, and what is more the client is still stalling.

What could be going wrong here and how can I instruct the a Microsoft SQL Server (not just suggest to it -- programmatically in Java) that it must return rows in smaller chunks rather than all at once or improve JDBC query times by some other measures.

Two further observations that seem somewhat strange and that perhaps point to a different root cause entirely:

  • When the client stalls it still shows only relatively light CPU load, unlike what I would expect from heavy garbage collection
  • "responseBuffering=adaptive" should be the normal default by now

UPDATE I've checked and found that switching from PreparedStatement to Statementdoes not improve things in my case (it apparently can help in other cases).

UPDATE Here is my current query:

select 
    PARENT.IDENTIFIER    as PARENT_IDENTIFIER,
    PARENT.CLASS         as PARENT_CLASS,
    CHILD.TYPE           as CHILD_TYPE,
    CHILD.IDENTIFIER     as CHILD_IDENTIFIER,
    PROPERTY.IDENTIFIER  as PROPERTY_IDENTIFIER,
    PROPERTY.DESCRIPTION as PROPERTY_DESCRIPTION,
    PROPERTY.TYPE        as PROPERTY_TYPE,
    PROPERTY.PP          as PROPERTY_PP,
    PROPERTY.STATUS      as PROPERTY_STATUS,
    PROPERTY.TARGET      as PROPERTY_TARGET -- a date
from
    OBJECTS as CHILD
    left outer join RELATIONS              on RELATIONS.CHILD = CHILD.IDENTIFIER
    left outer join OBJECTS    as PARENT   on RELATIONS.PARENT = PARENT.IDENTIFIER
    inner join      PROPERTIES as PROPERTY on PROPERTY.OBJECT = CHILD.IDENTIFIER
where
    PROPERTY.TARGET is not null
order by
    case when PARENT.IDENTIFIER is null then 1 else 0 end,
    PARENT.IDENTIFIER,
    CHILD.IDENTIFIER,
    PROPERTY.TARGET,
    PROPERTY.IDENTIFIER
Community
  • 1
  • 1
Drux
  • 11,992
  • 13
  • 66
  • 116
  • 1
    You can use a `SQLServerStatement` to get the actual `responseBuffering` status. See sample code [here](http://pastebin.com/zbRPmA2Y). FWIW, that code tells me that 'adaptive' *is* the default with sqljdbc4.jar. – Gord Thompson Oct 24 '14 at 12:37
  • @GordThompson I'm by now also convinced that *adaptive* is on by default. Which makes the very slow performance (on the order of minutes for the remote JDBC client vs. on the order of few seconds in local Microsoft SQL Server Management Studio all the more mysterious to me. Why such a huge delta? – Drux Oct 24 '14 at 17:23
  • 2
    This wonderful article ([Slow in the Application, Fast in SSMS? Understanding Performance Mysteries](http://www.sommarskog.se/query-plan-mysteries.html)) explains why. – Bogdan Sahlean Oct 26 '14 at 08:56
  • @BogdanSahlean thx, I've posted a [follow-up](http://stackoverflow.com/questions/26571488/can-jdbc-client-execute-set-arithabort-on-on-microsoft-sql-server-2012) question here. – Drux Oct 26 '14 at 09:17
  • @Drux: Please add to this question following information: (1) the source code of T-SQL batch sended to SQL Server from JDBC client app and (2) the actual execution plan from SSMS (use Ctrl + M and F5) as **XML**. – Bogdan Sahlean Oct 26 '14 at 11:12
  • @BogdanSahlean I've added the SQL that is sent to the server to this question. Could you please provide an alternative description of how to extract the actual execution plan (without keyboard shortcuts). – Drux Oct 27 '14 at 12:40
  • 2
    [Ctrl-M] is the shortcut for `Query > Include Actual Execution Plan`. [F5] runs the query (`Query > Execute`). In the lower pane of the query window, in addition to the "Results" and "Messages" tabs, you should also see an "Execution Plan" tab. Open it, then right-click inside the tab body and choose "Show Execution Plan XML...". That will display the Execution Plan as XML. – Gord Thompson Oct 27 '14 at 14:17
  • Also, please provide the definitions of these tables. – RBarryYoung Oct 29 '14 at 16:06
  • Oh, and exactly how long does it take from the client? "*minutes*" is far to vauge to use in any analysis of the problem. – RBarryYoung Oct 29 '14 at 16:07
  • @Drux can you run the query in SSMS on the client? If so what is the performance like? – Steve Ford Oct 30 '14 at 09:02

4 Answers4

4

The adaptive buffering is a good answer. I would also recommend checking the connections' SET options via SQL Server Profiler.

When you start a trace, make sure ExistingConnections is selected. Compare a SPID from a JDBC connection and a SSMS connection. ARITHABORT comes to mind as one that I have seen cause a difference in performance between SSMS and JDBC driver. Microsoft briefly mentions it here: http://msdn.microsoft.com/en-us/library/ms190306.aspx. Stack Exchange information here: https://dba.stackexchange.com/questions/9840/why-would-set-arithabort-on-dramatically-speed-up-a-query

On Oracle, I have seen huge impacts by playing with the setFetchSize method on the Statement / PreparedStatement object. Apparently, the SQL Server driver does not support that method. However, there is an internal method in the driver for it. See Set a default row prefetch in SQL Server using JDBC driver for details.

Also, what are you doing in your while (rs.next()) loop? Try doing nothing other than reading a column, like rs.getInt(1). See what happens. If it flies, that suggests the bottleneck is in your former processing of the result set. If it is still slow, then the problem must be in the driver or database.

You could use SQL Server Profiler to compare the executions as they come in via JDBC and as you run it via SSMS. Compare the CPU, reads, writes and duration. If they are different, then the execution plan is probably different, which points me back to the first thing I mentioned: the SET options.

Community
  • 1
  • 1
Brandon
  • 9,822
  • 3
  • 27
  • 37
  • +1 FYI, the problem also occurs in the form `while (rs.next()) n++;` Can you please add a link to documentation about `SET` options, such as `ARITHABORT`. – Drux Oct 24 '14 at 18:23
  • 1
    Yes, done. Since `while (rs.next())` shows a problem, what if you don't even do `rs.next()`? Just execute the statement and stop there. What happens? Just trying to take code away until the problem goes away, hoping that will reveal the cause. – Brandon Oct 24 '14 at 18:30
  • Without the loop it is very fast. With `statement.execute("SET ARITHABORT ON")` (returns `false`, BTW) inserted immediately after creating the connection and before the loop, it is still very (minutes) slow. – Drux Oct 24 '14 at 19:04
  • Still not completely clear what is going on, but yours' was the best answer in this situation. Thx. – Drux Oct 30 '14 at 17:17
2

I'm simply going to toss out this suggestion, and leave it for you to test.

The JDBC driver may well be FETCHING all of the rows before it returns, whereas the other system is simply returning the open cursor.

I have seen this behavior on other databases with JDBC, but had not direct experience with SQL Server.

In the examples where I have seen it, setting the auto commit to false for the connection prevents it from loading the entire result set. There are other settings to have it load only portions, etc.

But that could well be the underlying issue you are facing.

Will Hartung
  • 115,893
  • 19
  • 128
  • 203
  • 1
    +1 but unfortunately setting auto commit to false does not seem to make a difference in this case. – Drux Oct 24 '14 at 19:09
2

We had a similar problem that turned out to be due to caching. We consulted a very good read about query plans.

SQL Server caches execution plans, you can see it using:

select * from sys.dm_exec_cached_plans

What worked for us was to ignore the cached execution plan for the slow queries.

What happens is probably that for different queries, the optimization stage takes into account the queries' parameter values. Since in some cases it makes more sense using a different execution plan.

If the execution plan has been cached, and there is a cache hit (using a prepared statement, which ignores parameters), the execution plan may be sub-optimal for the same query with different parameters.

To verify this you can try and restore some queries and see if you get a different execution plan for the same query with different parameter.

If that turns out to be the case, there are several things you can do:

  • For an immediate result add a recompile hint OPTION (RECOMPILE) to your query. This will recompile the query every time but may already be a lot faster than what you currently have. From the documentation:

RECOMPILE - Instructs the SQL Server Database Engine to discard the plan generated for the query after it executes, forcing the query optimizer to recompile a query plan the next time the same query is executed. Without specifying RECOMPILE, the Database Engine caches query plans and reuses them. When compiling query plans, the RECOMPILE query hint uses the current values of any local variables in the query and, if the query is inside a stored procedure, the current values passed to any parameters.

  • Invalidate cache (see Recompiling Execution Plans). The ways to invalidate cache are. Note: some of these are not supported in azure:
  1. Changes made to a table or view referenced by the query (ALTER TABLE and ALTER VIEW).

  2. Changes made to a single procedure, which would drop all plans for that procedure from the cache (ALTER PROCEDURE).

  3. Changes to any indexes used by the execution plan.

  4. Updates on statistics used by the execution plan, generated either explicitly from a statement, such as UPDATE STATISTICS, or generated automatically.

  5. Dropping an index used by the execution plan.

  6. An explicit call to sp_recompile.

  7. Large numbers of changes to keys (generated by INSERT or DELETE statements from other users that modify a table referenced by the query).

  8. For tables with triggers, if the number of rows in the inserted or deleted tables grows significantly.

  9. Executing a stored procedure using the WITH RECOMPILE option.

  • Optimize the plan manually using OPTIMIZE FOR hints (did not try it).

  • Use separate queries after classifying parameters.

Reut Sharabani
  • 30,449
  • 6
  • 70
  • 88
1

Perhaps, this link from Microsoft documentation can help you to solve your issue: http://msdn.microsoft.com/en-us/library/bb879937(v=sql.110).aspx

Especially in the "Guidelines for using adaptive buffering" part :

There are some cases where using selectMethod=cursor instead of responseBuffering=adaptive would be more beneficial, such as:

If your application processes a forward-only, read-only result set slowly, such as reading each row after some user input, using selectMethod=cursor instead of responseBuffering=adaptive might help reduce resource usage by SQL Server.

If your application processes two or more forward-only, read-only result sets at the same time on the same connection, using selectMethod=cursor instead of responseBuffering=adaptive might help reduce the memory required by the driver while processing these result sets.

In both cases, you need to consider the overhead of creating, reading, and closing the server cursors.

stacky
  • 800
  • 6
  • 18
  • Thx, but this document was already linked to in the question, so unfortunately no apparent help there. – Drux Oct 24 '14 at 17:20