2

I am doing a study on the feasibility of a Spring Batch composed of two datasources. A SQL datasource for the Spring Batch metadata and a MongoDB datasource (with transactional use) for the business data. The transactional aspect raises several questions here.

The following topic: Spring batch with MongoDB and transactions and related resources provide a number of answers to my questions. The answer mentions the use of Spring's JtaTransactionManager to manage distributed transactions on the two datasources. This technique uses the 2PC protocol. It is also the most robust solution if I understood correctly. https://www.infoworld.com/article/2077963/distributed-transactions-in-spring--with-and-without-xa.html?page=2

On the other hand, I found some resources about Spring's ChainedTransactionManager. This technique uses the best effort 1PC protocol. This solution is less robust, if I understand correctly the system can be in an inconsistent state in case of a problem in the infrastructure (network failure for example). The ChainedTransactionManager has the advantage of being easier to implement and offers better performance. I saw that it is deprecated https://github.com/spring-projects/spring-data-commons/issues/2232.

What are the concrete risks of using the ChainedTransactionManager in a Spring Batch? In case of an error, can I have inconsistencies between the Spring batch metadata and the business data in Mongo? I imagine there are also considerations to take into account with retry or chunk skip strategies?

Thanks a lot for your help.

1 Answers1

2

In case of an error, can I have inconsistencies between the Spring batch metadata and the business data in Mongo?

Yes, that's is the risk you should be aware of.

A common technique to avoid that is to disable state management and use the process indicator pattern. You can find an example here.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50