5

I have a batch implemented with JSR-352 (using jberet on wildfly).

I have a chunk with item-count 15 and java.lang.Exception is configured as retryable and skippable exception.

When there are many exceptions, most of the items will be processed multiple times. In this extreme case all items would throw an exception in the writer:

  • First 15 items are read
  • Exception occurs on first item
  • Chunk is rolled back and configured with item-count = 1
  • First item is read
  • Exception occurs again, item is skipped
  • Proceed with the other 14 items, exception may occur on every item, every item is skipped
  • After the first 15 items the chunk is back with item-count = 15
  • Items 16-30 are read
  • Exception occurs again
  • Reader is rolled back to latest checkpoint

At this point there is still no checkpoint because there was no successful processed item yet. Hence the reader starts with the first item again. All 30 items are processed with item-count = 1. etc.

If there are many such failures the batch would process all items again and again.

I think the checkpoint needs to be set also for skipped items because a skipped item should not be processed again.

I think this is a bug in the specification so I already opened an issue there: https://github.com/WASdev/standards.jsr352.batch-spec/issues/15 Or am I wrong and have misunderstood the implementation?

How is this implemented in Spring Batch?

Michael Minella
  • 20,843
  • 4
  • 55
  • 67
cornz
  • 641
  • 4
  • 18

1 Answers1

2

I think the specification is clear enough, which suggests this could be a JBeret bug (assuming it's not an application issue).

In the spec (an unofficial version here), the section:

8.2.1.4.3 Retry and Skip the Same Exception

says that during a retry with rollback, the items are processed one-at-a-time, (in one-item chunks), and that skip takes precedence during retry.

So if a skippable exception occurs during retry, that item would just be skipped, and an updated checkpoint should be persisted. This is how WebSphere Liberty Batch, the JSR 352 implementation I work on, does it.

So I'd suggest producing a recreate project and opening a JBeret issue if it still looks like one. At this point, I don't see a spec issue.

Scott Kurz
  • 4,985
  • 1
  • 18
  • 40
  • I totaly agree that the item is first retried in one-item chunk and while this retry the skip exception takes precedence. But I cannot find the part of the spec that says that a skip need to update the checkpoint. The process in the official spec, `11.9 Chunk with RetryListener` page 124 also says in step q. that it resumes from S1 without executing line r and s which seems to be the checkpoint update. But I am quite new with the spec and may miss something. At least it is good to hear that it is no issue with BatchEE and I will create a small test case. – cornz Nov 16 '18 at 21:30
  • 2
    True, you won't see anything about skip needing to update the checkpoint during retry. Actually the spec doesn't explicitly say that a new checkpoint is taken after any skip. It's assumed to be "obvious" that to "skip" means not to rollback, retry, or fail execution. Since a checkpoint is taken at the end of a normal chunk, then, well, a checkpoint should be taken at the end of a chunk involving a skipped item. This also applies for the special single-item chunk used in a retry after rollback. So, I'm not saying the spec couldn't have been clearer. Just reconstructing the reasoning. – Scott Kurz Nov 16 '18 at 22:00
  • Just for the record: I mentioned Liberty Batch not BatchEE. – Scott Kurz Nov 17 '18 at 00:06
  • 1
    Ok, I thought the WLP uses batchEE. At least it is a bug in jberet, I opend the issue https://github.com/jberet/jsr352/issues/116 – cornz Nov 17 '18 at 22:32
  • Thanks for reporting and investigating this issue with JBeret. We'll look into it. – cheng Nov 19 '18 at 03:25
  • Looking over the JBeret issue, let me just note that, while the checkpointing behavior is specified clearly enough, the metric counts may not be across implementations. So be careful putting too much weight on them in your testing. – Scott Kurz Nov 19 '18 at 14:44