1

I have a Spring Boot Batch job with two primary steps, the first reads a bunch of rows from a spreadsheet. The second writes to a database. Right now, it's set up to write serially to the database.

public CompositeItemWriter<SoftLayerData> compositeSoftlayerDataWriter(
    JpaItemWriter<SoftLayerData> softlayerDataWriter) {
  CompositeItemWriter<SoftLayerData> compositeWriter = new CompositeItemWriter<>();
  compositeWriter.setDelegates(asList(softlayerDataWriter));
  return compositeWriter;
}

The problem is the volume is large. Since there's no reason to maintain any order, I'd like to have multiple writers. I tried this:

final int writerCount = 10;
List<ItemWriter<? super SoftLayerData>> writers = new ArrayList<>(writerCount);
for(int counter=0;counter<writerCount;counter++) {
  writers.add(new JpaItemWriter<SoftLayerData>());
}
CompositeItemWriter<SoftLayerData> result = new CompositeItemWriter<>();
result.setDelegates(writers);
return result;

But I'm getting an IllegalArgumentException: No EntityManagerFactory specified.

I like the approach, but I suspect there's some really complex Spring Boot way that I have to follow. What's the best approach to doing multiple writers?

halfer
  • 19,824
  • 17
  • 99
  • 186
Woodsman
  • 901
  • 21
  • 61
  • Is there a need for two steps for that? Where is step1 writing items to and where is step2 reading items from? Why not using a single chunk-oriented step instead of two steps? – Mahmoud Ben Hassine Sep 13 '21 at 12:59

1 Answers1

1

The CompositeItemWriter calls delegate writers is sequence, not in parallel. So creating 10 JpaItemWriters as delegates in the composite writer won't make your step multi-threaded.

If you want the step to become multi-threaded, you need to add a TaskExecutor to it, something like:

@Bean
public TaskExecutor taskExecutor() {
    return new SimpleAsyncTaskExecutor("spring_batch");
}

@Bean
public Step sampleStep(TaskExecutor taskExecutor) {
    return this.stepBuilderFactory.get("sampleStep")
                .<String, String>chunk(10)
                .reader(itemReader())
                .writer(itemWriter())
                .taskExecutor(taskExecutor)
                .build();
}

Please refer to Multi-Threaded step.

Now your issue is that you are creating a JpaItemWriter with the new operator, so the afterPropertiesSet method is not called by Spring to check mandatory properties. You need to set an EntityManagerFactory on this writer.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • My concern is that I wanted multiple parts of one part of the step, the writing portion, to run in parallel and not just two whole steps running in parallel. Let's try for 10 at once. I did not just want to build a list of 10 items then write. Does your above snippet run the writers in parallel? – Woodsman Sep 13 '21 at 12:47
  • The sample in the answer is a single step, not two steps, and it will be executed in parallel by different threads. – Mahmoud Ben Hassine Sep 16 '21 at 09:03
  • I refactored my previous batch to receive items from it's previous step. The reason is the difficulty of splitting the initial reader. How many threads will your solution above use? Do I need that complex partitioning code to do this? – Woodsman Sep 19 '21 at 23:42
  • How do I control, or even know, how many threads the Multi-Threaded-Step creates? PS. I saw your video on Youtube earlier today. I can't multi-thread the reader, but I can do the writer multi-threaded. How do I pass items from one step to another? Looking for simple, in-memory solution. Can I pay VMware to answer my questions? – Woodsman Sep 20 '21 at 01:28
  • Mahmoud, thanks for your help; I accepted your answer because it largely gives me what I wanted. I watched your video and thought it gives each chunk its own thread. So the number of your threads is roughly speaking, the total item count divided by the chunk size. However, I did a test run with 10 items, and it spawned 4 threads. – Woodsman Sep 20 '21 at 03:36
  • Glad it helped. `how many threads the Multi-Threaded-Step creates?`: The number of threads depends on the TaskExecutor you pass to the step. The sample in my answer uses the simplest TaskExecutor implementation which is SimpleAsyncTaskExecutor. This implementation does not reuse threads and creates a thread for each task. You can use a ThreadPoolTaskExecutor and set the number of threads you want to use to execute the step. – Mahmoud Ben Hassine Sep 20 '21 at 06:13
  • `How do I pass items from one step to another?`: You can use the execution context for that, please refer to https://docs.spring.io/spring-batch/docs/4.3.x/reference/html/common-patterns.html#passingDataToFutureSteps. and https://stackoverflow.com/questions/2292667. I hope this helps. – Mahmoud Ben Hassine Sep 20 '21 at 06:15
  • Mahmoud, I overrided the JPA writer used in the multi-threaded test solely to print the thread id. I wanted to make sure it was using multiple threads. I found that for SimpleAsyncTaskExecutor, I got multiple threads, even for that 1 step. – Woodsman Sep 20 '21 at 15:24
  • Re. passing items: some previous SO posts suggested that the context had a limit of what could be stored. For my purpose, I don't need restartability and prefer speed, so I didn't want store in the context. I notice Spring likes to log things in one or more batch tables, including the context. Using the context requires an I/O write to the database. – Woodsman Sep 20 '21 at 15:32