0

I'm new to spring batch. I came across this problem, My application runs every 5mins and read the data from the blob in cloud which has some records and process those records into kafka topic. Problem is, right now its running in single instance where there is no problem but if i plan to run in multiple instances, If the both instances runs same time picks the same records there will be duplicates in the kafka topic because the code base is same. Is there any feature that helps in spring batch to overcome this issue or any other approach that wont create duplicates.

Thanks in advance Vamsi

  • Does this answer your question? [Activate Batch on only one Server instance](https://stackoverflow.com/questions/60216411/activate-batch-on-only-one-server-instance) – Mahmoud Ben Hassine May 07 '20 at 13:26
  • Welcome to stackoverflow. https://stackoverflow.com/help/minimal-reproducible-example – Hemant May 09 '20 at 14:23

1 Answers1

0

Kafka consumer group can help you. Inorder to keep your Spark batch instances in sync with each other you should set group.id with same value group1 and kafka will take care of the rest

Properties props = new Properties();
props.put("group.id", "group1");
... // other props
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
QuickSilver
  • 3,915
  • 2
  • 13
  • 29