1

My step is supposed to write a ton of items to DB table with unique index on few columns, therefore some items will produce DataIntegrityViolationException. I want to make my step faultTolerant to that without setting chunk size to 1. The following configuration unfortunately does not work as expected and just skip whole chunk when exception occurs, probably I misread something here:

stepBuilderFactory.get("someStep")
    .<InputDto, Entity>chunk(100)
    .reader(reader())
    .processor(processor())
    .writer(repositoryItemWriter(repository())
    .faultTolerant()
    .skipLimit(Integer.MAX_VALUE)
    .skip(DataIntegrityViolationException.class)
    .noRollback(DataIntegrityViolationException.class)
    .processorNonTransactional()
    .build();

Also as mentioned here behaviour of exception skip in chunk sounds a bit expensively, doesn't it? What is the most efficent way to deal with it then? Select for uniqueness check before insert doesn't look great either.

Tovarisch
  • 39
  • 2
  • 13

2 Answers2

1

Instead of using writer(repositoryItemWriter(repository()), you could use your own writer, with the goal of handling DataIntegrityViolationException on an item-by-item basis. This way, you can manage the exceptions yourself and decide how to handle each one.

That seems preferable to the default use case, when DataIntegrityViolationException occurs, and... the entire chunk is skipped, since Spring Batch wraps said chunk in a transaction, and if any item in the chunk fails, the whole transaction is rolled back.

Since the RepositoryItemWriter does not actually write the items immediately, but collects them and writes them in a batch at the end of the chunk, the catch block is not being visited.

You can define your own ItemWriter: try and extend JpaItemWriter or JdbcBatchItemWriter (depending on whether you are using JPA or JDBC) and override the write method to handle the DataIntegrityViolationException for each item.

public class CustomJpaItemWriter extends JpaItemWriter<Entity> {

    @Override
    public void write(List<? extends Entity> items) {
        EntityManager entityManager = getEntityManagerFactory().createEntityManager();
        try {
            entityManager.getTransaction().begin();
            for (Entity item : items) {
                try {
                    entityManager.persist(item);
                    entityManager.flush();
                    entityManager.clear();
                } catch (DataIntegrityViolationException e) {
                    // Handle or log the exception
                }
            }
            entityManager.getTransaction().commit();
        } finally {
            entityManager.close();
        }
    }
}

Then you own writer bean operating on the repository:

// Define your writer
@Bean
public ItemWriter<Entity> customJpaItemWriter() {
    CustomJpaItemWriter writer = new CustomJpaItemWriter();
    writer.setEntityManagerFactory(entityManagerFactory);
    return writer;
}

And use it:

// Include the writer in your step
@Bean
public Step someStep() {
    return stepBuilderFactory.get("someStep")
        .<InputDto, Entity>chunk(100)
        .reader(reader())
        .processor(processor())
        .writer(customJpaItemWriter())
        .build();
}
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • the problem is that catch block is never visited, because actual write to database happens later in the transaction, therefore DataIntegrityViolationException still occurs and behaviour is the same, thank for the answer though – Tovarisch May 22 '23 at 08:48
  • @Tovarisch OK. I have rewritten the answer to address that point. – VonC May 22 '23 at 08:55
1

some items will produce DataIntegrityViolationException. I want to make my step faultTolerant to that

DataIntegrityViolationException is not a fault you want to tolerate. This is a fault that you want your job to fail at. A transient error could be tolerated, a temporary network issue could be tolerated, but an error that is related to data integrity or consistency should not be tolerated.

The following configuration unfortunately does not work as expected and just skip whole chunk when exception occurs [..] Also as mentioned here behaviour of exception skip in chunk sounds a bit expensively, doesn't it?

The exception you are getting happens at the commit time of the transaction and Spring Batch cannot know which item(s) caused the issue. Hence it will scan the chunk item by item to determine the faulty item and skip it. And yes, that has a cost. The answer you refer to explains the mechanism in detail.

What is the most efficent way to deal with it then? Select for uniqueness check before insert doesn't look great either.

In my opinion, skipping the DataIntegrityViolationException is working around the problem rather than fixing it. I would add a processor or a listener (ItemWriteListener#beforeWrite) that checks data integrity constraints and reject faulty items before writing items. Data validation is a typical use case for an item processor.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • Thank you for the answer! I decided to switch to the "insert on conflict do nothing" (postgresql), only drawback here is write_count may be different from actual amount of written records if I rely only on spring batch default mechanisms. – Tovarisch May 25 '23 at 08:11