2

Background

I have a Spring Batch job where :

  1. FlatFileItemReader - Reads one row at a time from the file
  2. ItemProcesor - Transforms the row from the file into a List<MyObject> and returns the List. That is, each row in the file is broken down into a List<MyObject> (1 row in file transformed to many output rows).
  3. ItemWriter - Writes the List<MyObject> to a database table. (I used this implementation to unpack the list received from the processor and delegae to a JdbcBatchItemWriter)

Question

  • At point 2) The processor can return a List of 100000 MyObject instances.
  • At point 3), The delegate JdbcBatchItemWriter will end up writing the entire List with 100000 objects to the database.

My question is : The JdbcBatchItemWriter does not allow a custom batch size. For all practical purposes, the batch-size = commit-interval for the step. With this in mind, is there another implementation of an ItemWriter available in Spring Batch that allows writing to the database and allows configurable batch size? If not, how do go about writing a custom writer myself to acheive this?

Ping
  • 587
  • 5
  • 27

3 Answers3

2

I see no obvious way to set the batch size on the JdbcBatchItemWriter. However, you can extend the writer and use a custom BatchPreparedStatementSetter to specify the batch size. Here is a quick example:

public class MyCustomWriter<T> extends JdbcBatchItemWriter<T> {

    @Override
    public void write(List<? extends T> items) throws Exception {
        namedParameterJdbcTemplate.getJdbcOperations().batchUpdate("your sql", new BatchPreparedStatementSetter() {
            @Override
            public void setValues(PreparedStatement ps, int i) throws SQLException {
                // set values on your sql
            }

            @Override
            public int getBatchSize() {
                return items.size(); // or any other value you want
            }
        });
    }

}

The StagingItemWriter in the samples is an example of how to use a custom BatchPreparedStatementSetter as well.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • @MohmoudBenHassine This looks promising. Regarding the `StagingItemWriter` example, instead of using a `BatchPreparedStatementSetter`, I could simply call [getJdbcTemplate().batchUpdate](https://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/jdbc/core/JdbcTemplate.html#batchUpdate-java.lang.String-java.util.Collection-int-org.springframework.jdbc.core.ParameterizedPreparedStatementSetter-) in the `write` method passing it the required batch size? – Ping Oct 24 '19 at 10:10
  • Yes, that's another option. – Mahmoud Ben Hassine Oct 24 '19 at 10:44
  • I've got the necessary direction to go forward. Thanks. – Ping Oct 24 '19 at 10:53
  • 1
    glad it helped! However, this is tricky. There will be a single transaction for the whole chunk, no matter how many items will result from the processing logic. For example, If commit-size=10 and for each item the processor returns 100 items, then the writer will receive 1000 items per chunk. Those 1000 items will be committed in a single transaction, even if the jdbc batch_size = 500. The batch size governs how many queries will go to the db (2 batchUpdate queries in this case, 500 items in each batch) and not how many transactions (a single transaction driven by Spring Batch in this case). – Mahmoud Ben Hassine Oct 24 '19 at 11:35
  • @MahmoudeBenHassine What kind of issues should I be aware of that can result from a single transaction being used for each call to `write` as opposed to one transaction used per batch size? Also, the `StageItemWriter` example from spring batch is also prone to the same issue?. One issue I can think of is that if the second batch commit fails (500 records), the entire write to the database will fail (all 1000 records). Is there a way to use a new transaction per batch-size? – Ping Oct 24 '19 at 12:04
  • 1
    There are no issues. The idea of chunk processing is to process the whole chunk as a unit (all or nothing semantics). Just wanted to clarify that the batch size you are trying to customize does not influence this model (it only influences how many queries will be sent to the db). – Mahmoud Ben Hassine Oct 24 '19 at 13:00
  • Thanks for the clarification once again. – Ping Oct 24 '19 at 13:05
1

The answer from Mahmoud Ben Hassine and the comments pretty much covers all aspects of the solution and is the accepted answer.

Here is the implementation I used if anyone is interested :

public class JdbcCustomBatchSizeItemWriter<W> extends JdbcDaoSupport implements ItemWriter<W> {

    private int batchSize;
    private ParameterizedPreparedStatementSetter<W> preparedStatementSetter;
    private String sqlFileLocation;
    private String sql;

    public void initReader() {
        this.setSql(FileUtilties.getFileContent(sqlFileLocation));
    }

    public void write(List<? extends W> arg0) throws Exception {
        getJdbcTemplate().batchUpdate(sql, Collections.unmodifiableList(arg0), batchSize, preparedStatementSetter);
    }

    public void setBatchSize(int batchSize) {
        this.batchSize = batchSize;
    }

    public void setPreparedStatementSetter(ParameterizedPreparedStatementSetter<W> preparedStatementSetter) {
        this.preparedStatementSetter = preparedStatementSetter;
    }

    public void setSqlFileLocation(String sqlFileLocation) {
        this.sqlFileLocation = sqlFileLocation;
    }

    public void setSql(String sql) {
        this.sql = sql;
    }
}

Note :

  1. The use of Collections.unmodifiableList prevents the need for any explicit casting.
  2. I use sqlFileLocation to specify an external file that contains the sql and FileUtilities.getfileContents simply returns the contents of this sql file. This can be skipped and one can directly pass the sql to the class as well while creating the bean.
Ping
  • 587
  • 5
  • 27
0

I wouldn't do this. It presents issues for restartability. Instead, modify your reader to produce individual items rather than having your processor take in an object and return a list.

Dean Clark
  • 3,770
  • 1
  • 11
  • 26
  • The processing required to convert 1 row in the file to multiple rows is relatively complex to put in an `ItemReader`. If I don't need to restart jobs considering my jobs never need to be restarted, is there any other issue with this approach that you see? – Ping Oct 24 '19 at 16:20
  • Much time has passed, but did want to follow-up on this. It would not be that complicated to create a custom `ItemReader` to return individual items (multiple per row). You would just need to appropriately track your `itemCount` to preserve restartability. This is much cleaner than the alternative you proposed. – Dean Clark Jul 29 '22 at 17:50