3

As a newbie to the Batch Processing API (JSR-352), I have some difficulties modeling the following (simplified) scenario:

  1. Suppose we have a Batchlet that produces a dynamic set of files in a first step.
  2. In a second step, all these files must be processed individually in chunks (via ItemReader, ItemProcessor and ItemWriter) resulting in a new set of files.
  3. In a third step these new files need to be packaged in one large archive.

I couldn't find a way to define the second step because the specification doesn't seem to provide a loop construct (and in my understanding partition, split and flow only work for a set with a known fixed size).

How could a job xml definition look like? Do I have to give up on the idea of chunking in the second step or do I have to divide the task into multiple jobs? Is there another option?

Jens Piegsa
  • 7,399
  • 5
  • 58
  • 106

1 Answers1

2

You can use a PartitionMapper to programmatically define a dynamic number of partitions for a partitioned step.

The mapper needs to create a PartitionPlan object which sets the number of partitions and provides partition-specific properties for each.

Your mapper's mapPartitions() method will look something like this outline:

public PartitionPlan mapPartitions() throws Exception {

    int numPartitions = // calculate number of partitions, however you want

    // create an array of Properties objects, one for each partition
    Properties[] props = new Properties[numPartitions];

    for (int i = 0; i < numPartitions; i++) {
        // create a Properties object for this partition
        props[i] = new Properties();

        props[i].setProperty("abc", ...);
        props[i].setProperty("xyz", ...);
    }

    // use the built-in PartitionPlanImpl from the spec or your own impl
    PartitionPlan partitionPlan = new PartitionPlanImpl(); 
    partitionPlan.setPartitions(numPartitions);

    // cet the Properties[] onto your plan
    partitionPlan.setPartitionProperties(props);

    return partitionPlan;
}

And then you can reference the partition-specific property values in substitution like this (which is the same way you reference statically-defined partition properties):

    <batchlet ref="myBatchlet">
        <properties>
            <property name="propABC" value="#{partitionPlan['abc']}" />
            <property name="propXYZ" value="#{partitionPlan['xyz']}" />
        </properties>
    </batchlet>
Jens Piegsa
  • 7,399
  • 5
  • 58
  • 106
Scott Kurz
  • 4,985
  • 1
  • 18
  • 40
  • Great, I'll try to put this together and will report back. – Jens Piegsa Oct 17 '19 at 13:35
  • Ok, after a little experiment I am now ready to see how this might work for us. My PartitionMapper implementation returns one property per partition, which contains the respective filename. Instead of the shown batchlet, I use a chunk. Its ItemReader, ItemProcessor, and ItemWriter are called in multiple iterations (for each partition). They can read the filename property and act independently. -- Thanks a lot. – Jens Piegsa Oct 17 '19 at 17:08
  • 1
    Glad to help. I'm thinking of editing the question description to make it more clearly associated with "how to run a dynamically-calculated number of partitions?" – Scott Kurz Oct 17 '19 at 18:38
  • 1
    In terms of future reference, this would make quite sense. – Jens Piegsa Oct 17 '19 at 19:18
  • Suppose , I have two numPartitions like variables or 2 array lists to be partitioned, can I do this in the same mapPartitions() method? – not-a-bug May 05 '21 at 06:15
  • Shubhro, I'm not understanding your question. Can you ask a new question with your code example? – Scott Kurz May 05 '21 at 11:46