I am trying improve the performance of the job listed below. As is, without threading, it runs successfully. But is runs very slow. I would like to thread step 2 where 95% of the work happens in the reading, filtering and transforming the input data read from very large heterogeneous files. The job:
• Step1 gets some job parameters that are passed into Step2.
• Step2 will read in X number of files. Each file is heterogenous, i.e., contains several different record formats. The records are filtered, transformed and sent to a single output file.
Does Spring Batch have a built-in way to thread Step2 in this scenario? For example, can I add some type of executor to step2? I’ve tried SimpleAsyncTaskExecutor and ThreadPoolTaskExecutor. Neither work. Adding SimpleAsyncTaskExecutor throws an exception. (See can we process the multiple files sequentially using spring Batch while multiple threads used to process individual files data..?)
Here is the batch configuration:
public Job job() {
return jobBuilderFactory.get("MyJob")
.start(step1())
.next(step2())
.build();
}
@Bean
public Step step1() {
return stepBuilderFactory.get("Step1GetJobParams")
.tasklet(MyParamsTasklet)
.build();
}
@Bean
public Step step2() {
return stepBuilderFactory.get("Step2")
.<InputDO, OutputDO>chunk(1000)
.reader(myMultiResourceReader())
.processor(myStep2ItemProcessor)
.writer(myStep2FileWriter())
.taskExecutor(???) line #23
.build();
}
@Bean
public MultiResourceItemReader<InputDO> myMultiResourceReader(){
MultiResourceItemReader<InputDO> multiResourceItemReader = new MultiResourceItemReader<InputDO>();
multiResourceItemReader.setResources(resourceManager.getResources());
multiResourceItemReader.setDelegate(myStep2FileReader());
multiResourceItemReader.setSaveState(false);
return multiResourceItemReader;
}
@Bean
public FlatFileItemReader<InputDO> myStep2FileReader() {
return new FlatFileItemReaderBuilder<InputDO>()
.name("MyStep2FileReader")
.lineMapper(myCompositeLineMapper())
.build();
}
@Bean
public PatternMatchingCompositeLineMapper<InputDO> myCompositeLineMapper() {
PatternMatchingCompositeLineMapper<InputDO> lineMapper = new PatternMatchingCompositeLineMapper<InputDO>();
Map<String, LineTokenizer> tokenizers = new HashMap<String, LineTokenizer>();
tokenizers.put("A", InputDOTokenizer.getInputDOTokenizer());
tokenizers.put("*", InputDOFillerTokenizer.getInputDOFillerTokenizer());
lineMapper.setTokenizers(tokenizers);
Map<String, FieldSetMapper<InputDO>> mappers = new HashMap<String, FieldSetMapper<InputDO>>();
mappers.put("A", new InputDOFieldSetMapper());
mappers.put("*", new InputDOFillerFieldSetMapper());
lineMapper.setFieldSetMappers(mappers);
return lineMapper;
}
@Bean
public FlatFileItemWriter<OutputDO> myOutputDOFileWriter() {
return new FlatFileItemWriterBuilder<OutputDO>()
.name("MyOutputDOFileWriter")
.resource(resourceManager.getFileSystemResource("myOutputDOFileName"))
.lineAggregator(new DelimitedLineAggregator<OutputDO>() {
{
setDelimiter("");
setFieldExtractor(outputDOFieldExtractor.getOutputDOFieldExtractor());
};
})
.lineSeparator("\r\n")
.build();
}
Any/all guidance is much appreciated!