1

I am developing Spring Boot Spring Batch code. Reading data from the Oracle DB and loading all the data into the MongoDB (NOSQL DB). Modelling of MongoDB is developed as de-normalized way as per the standard way of implementing mongo relations/modelling.

I've TableA and TableB table and Join Table TableAB between them which is 3rd Table. When I read TableA Table via JdbcCursorItemReader<TableA> that time for each PK Id of TableA I need to Query to SubDivision Table to get all the SubDivision for the TableA's PK and populate SubDivision Data set it into model of TableA. TableA model has list Of SubDivisions.

The only way I see making the query from TableAProcessor and set data into the model of TableA, its easy to implement, but issue is that its making 100K calls to DB from TableAProcess if I've 100K TableA records.

How can I achieve this and set the SubDivision data to the model of TableA from either using Tasklet or any other way?

How to avoid calling so many query from Processor ?

I cant make a single query due to some limitations hence I need to query one more query to DB to get SubDivision Data.

@Slf4j
public class TableAProcessor implements ItemProcessor<TableA, TableA>{

    @Autowired
    private TableADao tableADao;

    @Override
    public TableA process(TableA tableA) throws Exception {
        log.debug("TableA DETAILS : "+tableA);
        List<SubDivision> subDivisions = tableADao.getSubDivision(tableA.getPKId());
        tableA.setSubDivisions(subDivisions);
        return tableA;
    }
}

Model

public class TableA {
    @Transient
    private Integer Id;
    @Field
    private String mongoId;
    ........
    .......
    @Field
    private List<SubDivision> subDivisions;
}

TableABatchConfig.java

@Configuration
public class TableABatchConfig {

    private static final String sql = "SELECT * FROM TABLEA";

    @Autowired
    @Qualifier(value="oracleDataSource")
    private DataSource dataSource;

    @Bean(destroyMethod = "")
    @StepScope
    public JdbcCursorItemReader<TableA> TableAReader() throws Exception {
        JdbcCursorItemReader<TableA> reader = new JdbcCursorItemReader<TableA>();
        reader.setDataSource(this.dataSource);
        reader.setSql(sql);

        reader.setRowMapper(new TableARowMapper());
        reader.afterPropertiesSet();
        return reader;
    }

    @Bean
    public ItemProcessor<TableA, TableA> TableAProcessor() {
        return new TableAProcessor();
    }

    @Bean
    public TableAWriter TableAWriter() {
        return new TableAWriter();
    }
}

TableAJob.java

@Configuration
@PropertySource("classpath:application.properties")
public class TableAJob {
    @Value( "${spring.chunk.size}")
    private String chunkSize;

    @Autowired
    private JobBuilderFactory jobBuilderFactory;

    @Autowired
    private StepBuilderFactory stepBuilderFactory;

    @Autowired
    private JdbcCursorItemReader<TableA> TableAReader;

    @Autowired
    private ItemProcessor<TableA, TableA> TableAProcessor;

    @Autowired
    private TableAWriter TableAWriter;

    @Bean
    public TableAStepExecuListner TableAStepExecuListner() {
        return new TableAStepExecuListner();
    }

    @Bean("readTableAJob")
    @Primary
    public Job readTableAJob() {
        return jobBuilderFactory.get("readTableAJob")
                .incrementer(new RunIdIncrementer())
                .start(TableAStepOne())
                .build();
    }

    @Bean
    public Step TableAStepOne() {
        return stepBuilderFactory.get("TableAStepOne")
                .<TableA, TableA>chunk(Integer.parseInt(chunkSize))
                .reader(TableAReader)
                .processor(TableAProcessor)
                .writer(TableAWriter)
                .listener(TableAStepExecuListner())
                .build();
    }
}

dao

@Service
public class TableADao {

    private static final String SQL = "COMPLEX JOIN QUERY";

    @Autowired
    private JdbcTemplate jdbcTemplate;

    public List<SubDivision> getSubDivision(Integer pkId){
        List<Map<String, Object>> results = jdbcTemplate.queryForList(SQL,new Object[] { pkId });

        List<SubDivision> divisions = new ArrayList<>();
        for (Map<String, Object> row : results) {
            divisions.add(SubDivision.builder().subDivisionCd((String)row.get("SUBDIVISION_CD"))
                    ......
                    .........
                    .........
                    ......
                    .build());
        }
        return divisions;
    }
}

TableAWriter.java

public class TableAWriter implements ItemWriter<TableA>{
    @Autowired
    private TableARepository TableARepository;

    @Override
    public void write(List<? extends TableA> items) throws Exception {
        TableARepository.saveAll(items);
    }
}
PAA
  • 1
  • 46
  • 174
  • 282
  • 1
    This is a common pattern known as the [Driving Query Pattern](https://docs.spring.io/spring-batch/4.1.x/reference/html/common-patterns.html#drivingQueryBasedItemReaders). Unfortunately, if you use this pattern, you will indeed have as many queries to your secondary repository (for items details, `subDivisions` in your case) as items returned by your reader. However, I don't see how you can avoid this to solve your problem. – Mahmoud Ben Hassine Apr 09 '19 at 07:03
  • @Mahmoud Ben Hassine - Thank you for the answer. It looks to me this is very common use case in integrated system where you may need data based on some parameter from other systems. Can I assumed now its impossible to implement in Spring Batch ? – PAA Apr 09 '19 at 07:07
  • No, don't assume that, because it is possible and you did it! I'm just saying that this pattern suffers from the well known [n + 1 problem](https://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem-in-orm-object-relational-mapping) and you probably need to tackle your requirement differently. – Mahmoud Ben Hassine Apr 09 '19 at 07:26

0 Answers0