6

I have a query which generates a fairly large XML document (~30k) as a query column for each record of a large table, of the form...

SELECT recordKey, lastUpdatedDate, ( SELECT ... FOR XML PATH( 'elemName' ), TYPE )
FROM largeTable
ORDER BY lastUpdateDate

If I run this query from SQLServer management studio, the query returns almost instantly, showing the first rows, as I would expect, and continues to run in the background.

However, when I run this query from the Camel JDBC component in StreamList mode, it appears to cache the entire resultset at the point of querying, which means I run out of memory.

I've checked the JDBC driver properties, and explicitly set the responseBuffering property to adaptive, and have also tried setting the selectMethod to cursor, neither of which appear to make any difference to my query.

Is this a characteristic of querying XML with JDBC, or are there some parameters I need to set differently?

Screwtape
  • 1,337
  • 2
  • 12
  • 27
  • It should be related with camel jdbc component and not with underlying jdbc driver itself. Share your camel component configuration and version of camel, you are using. – skadya Dec 27 '18 at 13:21
  • I had initially thought it was a camel issue (2.17.6), so created the following question: https://stackoverflow.com/questions/53740386/camel-jdbc-streamlist-query-appears-to-load-whole-resultset-before-splitting – Screwtape Dec 27 '18 at 13:24
  • However, I couldn't see anything in the camel code (current) that would result in this behaviour, so began to assume it must be something in the JDBC part of the process, and hence this question. – Screwtape Dec 27 '18 at 13:26
  • SQL outputType `StreamList ` was added in version 2.18.0 I believe. – skadya Dec 29 '18 at 05:02
  • As I said in the question - I'm using the JDBC component rather than SQL, because StreamList was supported in 2.14 (at least in theory). I've just tried running a test in camel 2.20, and that doesn't work either. – Screwtape Dec 29 '18 at 11:47

2 Answers2

1

However, when I run this query from the Camel JDBC component in StreamList mode, it appears to cache the entire resultset at the point of querying, which means I run out of memory.

camel-sql introduce the 'StreamList' output type in v2.18.x. Since you are using v2.17.6, your configuration may be falling back to 'SelectList' (default value). This causes the load of whole result set as list in memory. Having xml type in your query/result set doesn not have any influence on this behavior.

You may find this code at: org.apache.camel.component.sql.SqlConsumer.poll()

I suggest you to upgrade camel-sql version to v18.x ( or latest)

Hope this helps.

skadya
  • 4,330
  • 19
  • 27
  • I'm accepting this answer as it has lead me to discover that it is not the SQLServer JDBC that is causing the problem as when using the Camel SQL v2.20 component, the StreamList mode works, but it doesn't in the Camel JDBC component in either v2.17 or v2.20 with my data. – Screwtape Dec 31 '18 at 11:41
  • See also: https://stackoverflow.com/questions/53740386/camel-jdbc-streamlist-query-appears-to-load-whole-resultset-before-splitting – Screwtape Dec 31 '18 at 11:42
0

Not sure if it is possible or desirable in your application but in any case... to load all contents for a table is something that we have to prevent when possible. So, I will propose you an alternative by obtaining pages:

DECLARE @RowsPerPage    AS INT = 5;
DECLARE @CurrentPage    AS INT = 3;
SELECT recordKey, lastUpdatedDate, ( SELECT ... FOR XML PATH( 'elemName' ), TYPE )
FROM largeTable
ORDER BY lastUpdateDate
    OFFSET (@CurrentPage-1)* @RowsPerPage ROWS
    FETCH NEXT @RowsPerPage ROWS ONLY;

Just change the parameters to set your desired Rows Per Page and Current Page. I hope it helps.

Angel M.
  • 1,360
  • 8
  • 17
  • Hi! Thanks for your suggestion. Unfortunately this workaround doesn’t solve the problem, and cannot be relied upon if the table is changing, which in my case it always will be, as it it primarily changes that I am trying to obtain, querying records changed after a certain date. – Screwtape Dec 21 '18 at 07:48
  • Just another idea... you may try the clause WITH (NOLOCK) SELECT recordKey, lastUpdatedDate, ( SELECT ... FOR XML PATH( 'elemName' ), TYPE ) FROM largeTable **WITH (NOLOCK)** ORDER BY lastUpdateDate – Angel M. Dec 21 '18 at 07:54
  • 1
    First `NOLOCK` is not a turbobutton. Reading uncommitted data could lead to serious problems. Second [No Offset](https://use-the-index-luke.com/no-offset) – Lukasz Szozda Dec 26 '18 at 12:31