How does Spring Batch manage transactions (with possibly multiple datasources)?

Question

I would like some information about the data flow in a Spring Batch processing but fail to find what I am looking for on the Internet (despite some useful questions on this site).

I am trying to establish standards to use Spring Batch in our company and we are wondering how Spring Batch behaves when several processors in a step updates data on different data sources.

This question focuses on a chunked process but feel free to provide information on other modes.

From what I have seen (please correct me if I am wrong), when a line is read, it follows the whole flow (reader, processors, writer) before the next is read (as opposed to a silo-processing where reader would process all lines, send them to the processor, and so on).

In my case, several processors read data (in different databases) and updates them in the process, and finally the writer inserts data into yet another DB. For now, the JobRepository is not linked to a database, but that would be an independent one, making the thing still a bit more complex.

This model cannot be changed since the data belongs to several business areas.

How is the transaction managed in this case? Is the data committed only once the full chunk is processed? And then, is there a 2-phase commit management? How is it ensured? What development or configuration should be made in order to ensure the consistency of data?

More generally, what would your recommendations be in a similar case?

score 3 · Accepted Answer · edited Jun 22 '15 at 11:41

Spring batch uses the Spring core transaction management, with most of the transaction semantics arranged around a chunk of items, as described in section 5.1 of the Spring Batch docs.

The transaction behaviour of the readers and writers depends on exactly what they are (eg file system, database, JMS queue etc), but if the resource is configured to support transactions then they will be enlisted by spring automatically. Same goes for XA - if you make the resource endpoint a XA compliant then it will utilise 2 phase commits for it.

Getting back to the chunk transaction, it will set up a transaction on chunk basis, so if you set the commit interval to 5 on a given tasklet then it will open and close a new transaction (that includes all resources managed by the transaction manager) for the set number of reads (defined as commit-interval).

But all of this is set up around reading from a single data source, does that meet your requirement? I'm not sure spring batch can manage a transaction where it reads data from multiple sources and writes the processor result into another database within a single transaction. (In fact I can't think of anything that could do that...)

Thank you for your answer. We will have to face the multiple database situation (each business sector has its own DB and we sometimes need to access several business elements). Those are useful elements though. — Chop, Jun 22 '15 at 07:24
yeah I was thinking about this, you would have to structure it so that the event comes from a single source and the process step could potentially query the various databases (which would happen in the established transaction) and then update the target db. Or maybe have something aggregate the events from the various producers and then have a single processor actually perform the updates. — stringy05, Jun 22 '15 at 11:58

How does Spring Batch manage transactions (with possibly multiple datasources)?

1 Answers1