3

In a JSR-352 batch I want to use partitioning. I can define the number of partitions via configuration or implement a PartitionMapper to do that.

Then, there are the JobContext and StepContext injectables to provide context information to my processing. However, there is no PartitionContext or the like which maintains and provides details about the partition I'm running in.

Hence the question:

How do I tell each partitioned instance of a chunk which partition it is running in so that its ItemReader can read only those items which belong to that particular partition?

If I don't do that, each partition would perform the same work on the same data instead of splitting up the input data set into n distinct partitions.

I know I can store some ID in the partition plan's properties which I can then use to set another property in the step's configuration like <property name="partitionId" value="#{partitionPlan['partitionId']}" />. But this seems overly complicated and fragile because I'd have to know the name of the property from the partition plan and must remember to always set another property to this value for each step.

Isn't there another, clean, standard way to provide partition information to steps?

Or, how else should I be splitting work by partitions and assign it to different ItemReader instances in the same partitioned chunk?

Update:

It appears that jberet has the org.jberet.cdi.PartitionScoped CDI scope, but it's not part of the JSR standard.

Scott Kurz
  • 4,985
  • 1
  • 18
  • 40
JimmyB
  • 12,101
  • 2
  • 28
  • 44
  • 1
    Though it would have been reasonable to include this as a standard part of the 1.0 API spec it wasn't, so unfortunately you are left to construct your own convention/solution. You're not missing anything. The Batch API project is not active at the moment, but I believe this issue has been noted multiple times, and hopefully should be a strong candidate when the project resumes. – Scott Kurz Sep 13 '18 at 13:37
  • I think (hope) you'll agree that by editing and rephrasing the question title a bit, I better captured the key question here. – Scott Kurz Sep 13 '18 at 13:39
  • @ScottKurz Yes, thanks for the improvement :) – JimmyB Sep 13 '18 at 14:19
  • @ScottKurz I can't quite understand why this feature is not included anywhere in the standard. I believe that a) it's a feature which everybody using partitions will need and b) telling workers the total number of workers and their index is pretty much standard in parallel computing. – JimmyB Sep 13 '18 at 14:23
  • It's too late now. Not EVERY app needs it though I'd support adding something like this. – Scott Kurz Sep 13 '18 at 15:45

1 Answers1

1

When defining a partition with either partition plan (XML), or partition mapper (programatical), include these information as partition properties, and then reference these partition properties within item reader/processor/writer properties.

This is the standard way to tell item reader and other batch artifacts what resource to handle, where to begin, and where to end. This is not much different from non-partitioned chunk configuration, where you also need to configure the source and range of input data with batch properties.

For example, please org.jberet.test.chunkPartitionFailComplete.xml from one of the jberet test apps.

cheng
  • 1,076
  • 6
  • 6