2

First the problem statement: I am using Spring-Batch in my DEV environment fine. When I move the code to a production environment I am running into a problem. In my DEV environment, Spring-Batch is able to create it's transaction data tables in our DB2 database server with out problem. This is not a option when we go to PROD as this is a read only job.

Attempted solution:

Search Stack Overflow I found this posting: Spring-Batch without persisting metadata to database?

Which sounded perfect, so I added

@Bean
public ResourcelessTransactionManager transactionManager() {
    return new ResourcelessTransactionManager();
}

@Bean
public JobRepository jobRepository(ResourcelessTransactionManager transactionManager) throws Exception {
    MapJobRepositoryFactoryBean mapJobRepositoryFactoryBean = new MapJobRepositoryFactoryBean(transactionManager);
    mapJobRepositoryFactoryBean.setTransactionManager(transactionManager);

    return mapJobRepositoryFactoryBean.getObject();
}

I also added it to my Job by calling .reporitory(jobRepository).

But I get

Caused by: java.lang.NullPointerException: null
    at       org.springframework.batch.core.repository.dao.MapJobExecutionDao.synchronizeStatus(MapJobExecutionDao.java:158) ~[spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]

So I am not sure what to do here. I am new to Spring so I am teaching myself as I go. I am open to other solutions, such as an in memory database, but I have not been able to get them to work either. I do NOT need to save any state or session information between runs, but the data base query I am running will return around a million or so rows, so I will need to get that in chunks.

Any suggestions or help would be greatly appreciated.

Community
  • 1
  • 1
VydorScope
  • 649
  • 1
  • 8
  • 20

4 Answers4

5

Add this beans to AppClass

@Bean
public PlatformTransactionManager transactionManager() {
    return new ResourcelessTransactionManager();
}


@Bean
public JobExplorer jobExplorer() throws Exception {
    MapJobExplorerFactoryBean jobExplorerFactory = new MapJobExplorerFactoryBean(mapJobRepositoryFactoryBean());
    jobExplorerFactory.afterPropertiesSet();
    return jobExplorerFactory.getObject();
}

@Bean
public MapJobRepositoryFactoryBean mapJobRepositoryFactoryBean() {
    MapJobRepositoryFactoryBean mapJobRepositoryFactoryBean = new MapJobRepositoryFactoryBean();
    mapJobRepositoryFactoryBean.setTransactionManager(transactionManager());
    return mapJobRepositoryFactoryBean;
}

@Bean
public JobRepository jobRepository() throws Exception {
    return mapJobRepositoryFactoryBean().getObject();
}

@Bean
public JobLauncher jobLauncher() throws Exception {
    SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
    simpleJobLauncher.setJobRepository(jobRepository());
    return simpleJobLauncher;
}
0

This doesn't directly answer your question, but that is not a good solution; the map-based repository is supposed to be used only for testing. It will grow in memory indefinitely.

I suggest you use an embedded database like sqlite. The main problem in using a separate database for job metadata is that you should then coordinate the transactions between the two databases that you use (so that the state of metadata matches that of the data), but since it seems you're not even writing in the main database, that probably won't be a problem for you.

Artefacto
  • 96,375
  • 17
  • 202
  • 225
  • Why would it grow in memory indefinitely? Once the work is done, the java program exits, and will not start again until the next day. Anything in memory should be cleared when the java program exits. The next day when it runs there should be a new in memory database created when the java program is called by our scheduler. I need no state of any kind to persist between runs. – VydorScope Feb 26 '16 at 17:18
  • @VydorScope Well, it grows indefinitely until the JVM exits or the context is shutdown. If that's how you run Spring batch then it's not a concern. But there are other problems (that may or may not apply to you) such as the fact that the map implementation doesn't participate in the transactions. – Artefacto Feb 26 '16 at 17:29
  • The JVM will be shut down, and the context is closed once the job is complete. – VydorScope Feb 26 '16 at 17:37
0

You could use an in-memory database (for example H2 or HSQL) quite easily. Examples of that you can find for example here: http://www.mkyong.com/spring/spring-embedded-database-examples/.

As for the Map-backed job repository, it does provide a method to clear its contents:

public void clear()

Convenience method to clear all the map DAOs globally, removing all entities.

Be aware that a Map-based job repository is not fit for use in partitioned steps and other multi-threading.

  • I am doing multithreading. I have sets of steps that I run in parallel. What do you mean by "not fit for use" in this case? – VydorScope Mar 01 '16 at 11:01
  • 1
    I asked that myself not long ago, [here](http://stackoverflow.com/questions/35484563/spring-batch-thread-safe-map-job-repository). The document explicitly states a Map-backed job repository should not be used then. – Heikki Doeleman Mar 09 '16 at 16:20
0

The following seems to have done the job for me:

@Bean
public DataSource dataSource() {        

    EmbeddedDatabaseBuilder builder = new EmbeddedDatabaseBuilder();
    EmbeddedDatabase db = builder
        .setType(EmbeddedDatabaseType.HSQL) 
        .build();
    return db;
}

Now Spring is not creating tables in our production database, and when the JVM exits state is lost so nothing seems to be hanging around.

UPDATE: The above code has caused concurrency errors for us. We have addressed this by abandoning the EmbeddedDatabaseBuilder and declaring the HSQLDB this way instead:

@Bean
    public BasicDataSource dataSource() {
        BasicDataSource dataSource = new BasicDataSource();
          dataSource.setDriverClassName("org.hsqldb.jdbcDriver");
          dataSource.setUrl("jdbc:hsqldb:mem:testdb;sql.enforce_strict_size=true;hsqldb.tx=mvcc");
          dataSource.setUsername("sa");
          dataSource.setPassword("");
        return dataSource;
    }   

The primary difference is that we are able to specify mvcc (Multiversion concurrency control) in connection string which resolves the issue.

VydorScope
  • 649
  • 1
  • 8
  • 20
  • this is not thread safe.If two threads simultanously try to access the same job there will be a Concurrent Thread access exception – Harish Jul 26 '16 at 23:21
  • How would you change it to be thread safe then, keeping with the limitations I am working under? – VydorScope Aug 22 '16 at 17:33
  • I used a Map based repository – Harish Aug 22 '16 at 18:13
  • As mentioned in the answer below from Heikki Doeleman, map is not recommended for multithreaded jobs like I am doing. – VydorScope Aug 23 '16 at 12:59
  • I faced concurrent exception when I used this embedded DB. I don't have a problem in Map – Harish Aug 23 '16 at 13:43
  • How do you clear the db before every job run ? Because the db size can grow if the batch application is never stopped ? Kindly help ! – Nitish Kumar Mar 20 '19 at 11:12
  • Nitish Kumar, I no longer work at the company where this code was deployed, but at the time it was part of a daily job and that was an in-memory DB. So when it exited after doing the days work, the DB was effectively dropped. – VydorScope Mar 20 '19 at 19:51