0

I tried to process an excel file using PoiItemreader in Spring Batch. The program runs successfully when the excel file is smaller or of normal size. However, when I tried to process the bigger file ( Bigger than 12MB). The file is not being read at all.

I have following questions:

  1. What is the limit of file size to use PoiItemreader?
  2. Will using MultiResourcePartioner work with this problem scenario?

Thank you very much.

Here is my code:

@Configuration
@EnableBatchProcessing
public class BatchConfiguration {

    @Autowired
    public JobBuilderFactory jobBuilderFactory;
    @Autowired
    public StepBuilderFactory stepBuilderFactory;

    @Bean
    public Job csvProcessJob() throws FileNotFoundException {
        return jobBuilderFactory.get("csvProcessJob")
                .incrementer(new RunIdIncrementer())
                .flow(csvProcessStep())
                .end()
                .build();
    }

    @Bean
    public Step csvProcessStep() throws FileNotFoundException {
        return stepBuilderFactory.get("stepCSVprocess")
                .<String, Map<String, AttributeValue>> chunk(25)
                .reader(excelReader())
                .processor(processor())
                .writer(writer())
                .build();
    }

    @Bean
    public PoiItemReader excelReader() throws FileNotFoundException {
        PoiItemReader reader = new PoiItemReader();
        reader.setLinesToSkip(1);
        reader.setResource(new ClassPathResource("file_name.xls"));
        reader.setRowMapper(excelRowMapper());
        return reader;
    }

    private RowMapper<MetaData> excelRowMapper() {
        return new MetaDataRowMapper();
    }

    @Bean
    public ItemProcessor<MetaData,Map<String,AttributeValue>> processor() {
        return new MapProcessor();
    }

    @Bean
    public ItemWriter writer() {
        return new AWSwriter();
    }
  • 1
    Assuming you are using the `PoiItemReader` I created that indeed has the drawback of loading everything in memory at once. Sadly that is how POI works. There is a way to load it streamingly see https://stackoverflow.com/questions/33786219/apache-poi-streaming-sxssf-for-reading for inspiration. I'm considering implementing a streaming reader as well,but the issue is the time I have at my hands. – M. Deinum Jan 27 '20 at 07:20
  • Thank you very much for your comment. I will follow the referred link and update what I found. Do you think that the multiResourcePartitioner will work? @M. Deinum – Sudarat Tangnimitchok Jan 27 '20 at 07:53
  • 1
    The multiresource won't help. It allows to read multiple resources like it is a single one. It doesn't split asingle file into multiple smaller ones. – M. Deinum Jan 27 '20 at 07:57

1 Answers1

0

Update for my question, I follow the link in the comment as M. Deinum post and be able to corporate that to my own custom itemreader. Now the program is running properly with the down side that it works only with .xlsx not .xls