I have a job that reads from SQL Server database list of documents. Documents needs to be in some status and sorted by column status_updated_time
.
I want to read an document.id
and then to process it in job processor as Driving Query Based ItemReaders.
Column status is changed in writer, so I can't use JpaPagingItemReader
because of this problem.
I used JdbcPagingItemReader
but got an error on sorting by status_updated_time
.
Then I tried to add and id
to sorting, but that didn't help.
Query that I want to get is:
SELECT id
FROM document
WHERE status IN (0, 1, 2)
ORDER BY status_updated_time ASC, id ASC
My reader:
@StepScope
@Bean
private ItemReader<Long> statusReader() {
JdbcPagingItemReader<Long> reader = new JdbcPagingItemReader<>();
...
reader.setRowMapper(SingleColumnRowMapper.newInstance(Long.class));
...
Map<String, Order> sortKeys = new HashMap<>();
sortKeys.put("status_updated_time", Order.ASCENDING);
sortKeys.put("id", Order.ASCENDING);
SqlServerPagingQueryProvider queryProvider = new SqlServerPagingQueryProvider();
queryProvider.setSelectClause(SELECT_CLAUSE);
queryProvider.setFromClause(FROM_CLAUSE);
queryProvider.setWhereClause(WHERE_CLAUSE);
queryProvider.setSortKeys(sortKeys);
reader.setQueryProvider(queryProvider);
...
return reader;
}
Where constants are:
private static final String SELECT_CLAUSE = "id";
private static final String FROM_CLAUSE = "document";
private static final String WHERE_CLAUSE = "status IN (0, 1, 2) ";
When job is executed I get error:
org.springframework.dao.TransientDataAccessResourceException: StatementCallback; SQL [SELECT TOP 10 id FROM document WHERE status IN (0, 1, 2) ORDER BY id ASC, status_updated_time ASC]; The column name status_updated_time is not valid.; nested exception is com.microsoft.sqlserver.jdbc.SQLServerException: The column name status_updated_time is not valid.
at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:110)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
at org.springframework.jdbc.core.JdbcTemplate.translateException(JdbcTemplate.java:1443)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:388)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:452)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:462)
at org.springframework.batch.item.database.JdbcPagingItemReader.doReadPage(JdbcPagingItemReader.java:210)
at org.springframework.batch.item.database.AbstractPagingItemReader.doRead(AbstractPagingItemReader.java:108)
at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.read(AbstractItemCountingItemStreamItemReader.java:92)
at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead(SimpleChunkProvider.java:94)
at org.springframework.batch.core.step.item.FaultTolerantChunkProvider.read(FaultTolerantChunkProvider.java:87)
at org.springframework.batch.core.step.item.SimpleChunkProvider$1.doInIteration(SimpleChunkProvider.java:119)
I saw some question regarding The column name XYZ is not valid
on stack overflow (this...) but haven't seen anything that works in my case where I need to sort by another column.
Another problem is sorting columns order.
No matter if I first add status_updated_time
or id
to the map sorting in generated script is always ORDER BY id ASC, status_updated_time ASC
.
EDIT: Reading this question, specially this line:
JdbcPagingItemReader assumes here that the sort key and the column in the select clause are called exactly the same
I realized that I need column status_updated_time
in result set, so I refactored:
private static final String SELECT_CLAUSE = "id, status_updated_time";
...
queryProvider.setSelectClause(SELECT_CLAUSE);
...
reader.setRowMapper(
(rs, i) -> {
Document document = new Document();
document.setId(rs.getLong(1));
document.setStatusUpdatedTime(rs.getObject(2, Timestamp.class));
return document;
}
);
Now application can compile and job can run.
But, problem with sorting stay the same. I can't order by first status_updated_time
and then id
. id
always comes first.
I tried to remove id
from sorting and came to another problem.
On test env. I had 1600 rows to process. My job process row and update status_updated_time
to now()
. When job started processing he didn't stop at 1600, but continue processing because each row got new status_updated_time
and reader consider it it new row, and kept processing endlessly.
When sort only by id
job processed 1600 rows and then stopped.
So it seems like I can't use JdbcPagingItemReader
because of sorting problem.
And I wanted some reader that can run in parallel to speed up this job (it runs about 20 minutes each hour in a day).
Any suggestions?
Yes, job already exists and work single threaded with JpaPagingItemReader. I need to optimize and speed up execution.
Good idea for partitioning. Thank you. I will try to implement that, and will post a results here. – salerokada Jul 16 '20 at 09:47