3

I've got a table of over 1 million customers. Every customer's information gets updated often but will only be updated once a day. I've got a Spring batch job which

  • reads a customer from customer table (JdbcCursorItemReader)
  • processes the customer information (ItemProcessor)
  • writes to the customer table (ItemWriter)

I want to run 10 jobs at once which will read from one Customer table without reading a customer twice. Is this possible with Spring batch or is this something that I will have to handle at the database level using crawlLog table as mentioned in this post ?

How do I lock read/write to MySQL tables so that I can select and then insert without other programs reading/writing to the database?

I know that parameters can be passed to the job. I can read all the customer ids and distribute the customer ids to the 10 jobs evenly. But would this be right way of doing it?

Community
  • 1
  • 1
user794783
  • 3,619
  • 7
  • 36
  • 58

1 Answers1

3

The Framework has several ways to specify what you want, it depends on what you got. The simpler one is just to add a task executor to the step or flow:

<step id="copy">
  <tasklet task-executor="taskExecutor" throttle-limit="10">
  ...
  </tasklet>
</step>

<beans:bean id="taskExecutor"
  class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
  <property name="corePoolSize" value="10"/>
  <property name="maxPoolSize" value="15"/>
</beans:bean>

You may want to have a look at this and the others techniques in the official Spring Batch documentation about scalability.

Ither
  • 2,527
  • 3
  • 27
  • 39