1

I am looking for a strategy to batch all my queries (with IN clause) to overcome the restrictions by databases on IN clause (See here).

I usually get list of size 100000 to 305000. So, this has become very important to tackle.

I have tried two strategies so far.

Strategy 1:

  • Create an entity and hence a table with one column to hold such values (can we create temp tables on the fly with JPA 2.0 vendor-independent?) and use the data from the temp table as a subquery to the original query before eventually cleaning up the temp table.

  • Advantage: Very performant queries. Really quick, I must admit for the numbers I have mentioned, it was mostly under a minute.

  • Possible drawback: Use of temp table which is actually a permanent one in my case thus far.

Strategy 2:

  • Calculate the batch size for the given input list and for each batch execute the query and accumulate the result.

  • Advantage: No temp tables. Easy for any threads within the same transaction.

  • Disadvantage: A big disadvantage is amount of time it takes to execute all the batches. For the mentioned numbers, this is at an unacceptable level at the moment. Takes anything between 5 to 15 mins!

I would appreciate any feedback, suggestions or improvements from all you JPA gurus.

Thanks.

Community
  • 1
  • 1
JProgrammer
  • 321
  • 4
  • 15

1 Answers1

1

I only tested up to 50,000 integers but I have some pretty good performance data around splitting large lists using various methods, with CLR and a numbers table leading the pack at the higher end:

Not sure if you are using integers or strings but the results should be roughly equivalent.

As an aside, I'll confess I have no idea what JPA 2.0 is, but I assume you can control the format of the lists that it sends to SQL Server.

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • Many thanks for posting Aaron. Your solution looks good. I need to translate the pure SQL stuff into its JPA equivalent :) – JProgrammer Feb 21 '12 at 21:26