6

I've created a Spring batch application using Spring boot, and I have a Job with 9 steps. These steps are using a DataSource which I created its bean in a configuration file as follows:

@Configuration
public class DatabaseConfig {
    @ConfigurationProperties(prefix = "spring.datasource")
    @Bean
    @Primary
    public DataSource dataSource(){
        return DataSourceBuilder.create().build();
    }
}

This DataSource is using properties declared in the application.yml file:

spring:
  datasource:
    url: jdbc:mysql://localhost:3306/db_01?zeroDateTimeBehavior=convertToNull
    username: xxxx
    password: ****

So far, all works as expected.

What I want to do, is that I have 4 databases parameterized in a 5th database (db_settings), which I select using an SQL query. This query will return the 4 databases with their usernames and passwords as follows:

+--------+-----------------------------------+-----------------+-----------------+
| id     | url                               | username_db     | password_db     |
+--------+-----------------------------------+-----------------+-----------------+
|    243 | jdbc:mysql://localhost:3306/db_01 | xxxx            | ****            |
|    244 | jdbc:mysql://localhost:3306/db_02 | xxxx            | ****            |
|    245 | jdbc:mysql://localhost:3306/db_03 | xxxx            | ****            |
|    247 | jdbc:mysql://localhost:3306/db_04 | xxxx            | ****            |
+--------+-----------------------------------+-----------------+-----------------+

So instead of running the steps using the database declared in 'application.yml', I want to run them on all the 4 databases. And considering the volume processed, it is necessary to be able to launch the batch processing on these databases in parallel.

Does anyone know how to implement this?

Pang
  • 9,564
  • 146
  • 81
  • 122
Renaud is Not Bill Gates
  • 1,684
  • 34
  • 105
  • 191
  • 4
    I don't have time for a detailed answer right now, but you can combine two neat features to achieve this: the partitionned jobs from Spring Batch the partition key being the datasource id 243, 244, etc) and the AbstractRoutingDataSource from Spring JDBC (which will dynamically select the right datasource from the job parameter initialized with the partition key) – KeatsPeeks Jun 14 '17 at 11:43
  • @KeatsPeeks I'm sorry but I couldn't get anything, I would appreciate if you add more details, or a code example. – Renaud is Not Bill Gates Jun 19 '17 at 12:23

1 Answers1

1

Where is the bounty? :-)


Thanks KeatsPeeks, AbstractRoutingDataSource is a good starter for the solution, and here is a good tutorial on this part.

Mainly the important parts are:

  1. define the lookup code

public class MyRoutingDataSource extends AbstractRoutingDataSource { @Override protected Object determineCurrentLookupKey() { String language = LocaleContextHolder.getLocale().getLanguage(); System.out.println("Language obtained: "+ language); return language; } }

  1. register the multiple datasource

    <bean id="abstractDataSource" class="org.apache.commons.dbcp.BasicDataSource"
        destroy-method="close"
        p:driverClassName="${jdbc.driverClassName}"
        p:username="${jdbc.username}"
        p:password="${jdbc.password}" />
    
    <bean id="concreteDataSourceOne"
        parent="abstractDataSource"
        p:url="${jdbc.databaseurlOne}"/>
    
     <bean id="concreteDataSourceTwo"
        parent="abstractDataSource"
        p:url="${jdbc.databaseurlTwo}"/>
    

So after that, the problem is become to:

  1. How to load datasource config properties when spring startup and config the corresponding dataSource using the config properties in database.

  2. How to use multiple dataSource in spring batch

    Actually when I try to google it, seems this is a most common case, google give the suggestion search words - "spring batch multiple data sources", there are a lots articles, so I choose the answer in

  3. How to define the lookup code based on the spring batch jobs(steps)

    Typically this should be a business point, You need define the lookup strategy and can be injected to the com.example.demo.datasource.CustomRoutingDataSource#determineCurrentLookupKey to routing to the dedicated data source.

Limitation

The really interesting is actually it is supports the multiple dataSource, but the db settings cannot store in the DB indeed. The reason is it will get the cycle dependencies issue:

The dependencies of some of the beans in the application context form a cycle:

   batchConfiguration (field private org.springframework.batch.core.configuration.annotation.JobBuilderFactory com.example.demo.batch.BatchConfiguration.jobs)
      ↓
   org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration (field private java.util.Collection org.springframework.batch.core.configuration.annotation.AbstractBatchConfiguration.dataSources)
┌─────┐
|  routingDataSource defined in class path resource [com/example/demo/datasource/DataSourceConfiguration.class]
↑     ↓
|  targetDataSources defined in class path resource [com/example/demo/datasource/DataSourceConfiguration.class]
↑     ↓
|  myBatchConfigurer (field private java.util.Collection org.springframework.batch.core.configuration.annotation.AbstractBatchConfiguration.dataSources)
└─────┘

So obviously the solution is break the dependency between dataSource and routingDataSource

  • Save the DB setting in properties
  • Or involve other approach but not in the primary dataSource

See Also

https://scattercode.co.uk/2013/11/18/spring-data-multiple-databases/ https://numberformat.wordpress.com/2013/12/27/hello-world-with-spring-batch-3-0-x-with-pure-annotations/

http://spring.io/guides/gs/batch-processing/

How to java-configure separate datasources for spring batch data and business data? Should I even do it?

Github to get the codes.

Liping Huang
  • 4,378
  • 4
  • 29
  • 46