Threading a variable number of heterogeneous input files, process the input and output to a single file

Question

I am trying improve the performance of the job listed below. As is, without threading, it runs successfully. But is runs very slow. I would like to thread step 2 where 95% of the work happens in the reading, filtering and transforming the input data read from very large heterogeneous files. The job:

• Step1 gets some job parameters that are passed into Step2.

• Step2 will read in X number of files. Each file is heterogenous, i.e., contains several different record formats. The records are filtered, transformed and sent to a single output file.

Does Spring Batch have a built-in way to thread Step2 in this scenario? For example, can I add some type of executor to step2? I’ve tried SimpleAsyncTaskExecutor and ThreadPoolTaskExecutor. Neither work. Adding SimpleAsyncTaskExecutor throws an exception. (See can we process the multiple files sequentially using spring Batch while multiple threads used to process individual files data..?)

Here is the batch configuration:

                public Job job() {
                                return jobBuilderFactory.get("MyJob")
                                                                .start(step1())
                                                                .next(step2())
                                                                .build();
                }              

                @Bean
                public Step step1()  {
                                return stepBuilderFactory.get("Step1GetJobParams")
                                                                .tasklet(MyParamsTasklet)
                                                                .build();
                }
                
                @Bean
                public Step step2() {
                                return stepBuilderFactory.get("Step2")
                                                                .<InputDO, OutputDO>chunk(1000)
                                                                .reader(myMultiResourceReader())
                                                                .processor(myStep2ItemProcessor)
                                                                .writer(myStep2FileWriter())
                                                                .taskExecutor(???)                                          line #23
                            .build();
                }
                
                @Bean
                public MultiResourceItemReader<InputDO> myMultiResourceReader(){
                                MultiResourceItemReader<InputDO> multiResourceItemReader = new MultiResourceItemReader<InputDO>();
                                multiResourceItemReader.setResources(resourceManager.getResources());
                                multiResourceItemReader.setDelegate(myStep2FileReader());
                                multiResourceItemReader.setSaveState(false);
                                return multiResourceItemReader;
                }
                
                @Bean
                public FlatFileItemReader<InputDO> myStep2FileReader() {
                                return new FlatFileItemReaderBuilder<InputDO>()
                                                               .name("MyStep2FileReader")
                                                               .lineMapper(myCompositeLineMapper())
                                                               .build();
                }
                
                @Bean 
                 public PatternMatchingCompositeLineMapper<InputDO> myCompositeLineMapper() {
                                PatternMatchingCompositeLineMapper<InputDO> lineMapper = new PatternMatchingCompositeLineMapper<InputDO>();
                                Map<String, LineTokenizer> tokenizers = new HashMap<String, LineTokenizer>();
                                tokenizers.put("A", InputDOTokenizer.getInputDOTokenizer());
                                tokenizers.put("*", InputDOFillerTokenizer.getInputDOFillerTokenizer());
                                lineMapper.setTokenizers(tokenizers);
                                Map<String, FieldSetMapper<InputDO>> mappers = new HashMap<String, FieldSetMapper<InputDO>>();
                                mappers.put("A", new InputDOFieldSetMapper());
                                mappers.put("*", new InputDOFillerFieldSetMapper());
                                lineMapper.setFieldSetMappers(mappers);
                                return lineMapper; 
                 }
                
                @Bean
                public FlatFileItemWriter<OutputDO> myOutputDOFileWriter() {
                                return  new FlatFileItemWriterBuilder<OutputDO>()
                                               .name("MyOutputDOFileWriter")
                                               .resource(resourceManager.getFileSystemResource("myOutputDOFileName"))
                                               .lineAggregator(new DelimitedLineAggregator<OutputDO>() {
                                                               {
                                                                                                setDelimiter("");
                                                                                                setFieldExtractor(outputDOFieldExtractor.getOutputDOFieldExtractor());
                                                                                };
                                               })
                                               .lineSeparator("\r\n")
                                               .build();
                }

Any/all guidance is much appreciated!

Welcome to SO. Can you reduce the size of your question? See: https://stackoverflow.com/help/minimal-reproducible-example — Gray, Feb 11 '22 at 21:48

score 0 · Answer 1 · answered Feb 10 '22 at 03:18

0

I guess you want to use this mode of Multi-threaded Step to resolve read slowly problem. more details is available from spring batch office - Multi-threaded Step about it.

Hope to help you.

answered Feb 10 '22 at 03:18

神韵499

11
2

Threading a variable number of heterogeneous input files, process the input and output to a single file

1 Answers1