0

My question is very similar to this old SO question and no satisfactory answer is posted there.

There is a DB table in DB2 for which I am trying to do parallel record reads by two or more separate Java threads provided those threads should read different sets of data i.e. if THREAD-1 has read first 1000 records , THREADS-2 should not select those records but different 1000 ones if available.

Since threads will be reading different rows, no clash is expected at read time. Connection Object will also not be shared among threads - they will use their own DB Connections.

Database is DB2 and this table is already range partitioned on one of columns.

Number of threads will not be fixed but configurable so we can run with more threads on more powerful machines.

What could be strategies for limiting reader threads to a particular area of table and not to step on each others shoes i.e. how to logically divide table so threads keep reading on different areas? Can Java process utilize DB2 range partitions?

One solution is to handover different ranges of primary keys to these threads but for that I will have to read whole table at startup and that I am trying to avoid.

Community
  • 1
  • 1
Sabir Khan
  • 9,826
  • 7
  • 45
  • 98

1 Answers1

1

One way could be to select ony a portion of the data. Assuming your key is a number (or a number is part of your key) you could use a modulo function (mod) in the WHERE condition - this would be flexible enough for a configurable number of threads.

SELECT... FROM... WHERE mod(key, 3) = 0 

for the first thread of the three and mod(key, 3) = 1 for the second and so on.

If you really focus on partitions you have to query the DB2 system catalog first to retrieve the partition information for your table (SYSCAT.DATAPARTITIONS)

MichaelTiefenbacher
  • 3,805
  • 2
  • 11
  • 17
  • Thanks, a number column is one of primary key columns. Primary key is made up of three columns - two numbers and one varchar, varchar column is the one on which table is partitioned on. – Sabir Khan Mar 12 '16 at 14:16
  • I did experiment with various approaches and it looks that only restricting read data through WHERE clause for a thread is only viable option. – Sabir Khan Mar 17 '16 at 04:50
  • The other option to add a new column to table to indicate progress when a thread has picked up data is not an option in my case. This option will not have good performance either since updating rows will be additional burden. – Sabir Khan Mar 17 '16 at 05:06