0

My problem is more or less asked here Spring Batch : Compare Data Between Database however I still cannot get my head around it. Maybe it's a bit different.

I have A datasource and I want to write into database B.

I have full trust in A datasource, so if;

  1. A Does contain the record that B does not, I have to add B.
  2. A Does not contain the record that B does, I have to delete from B
  3. A does contain, B does contain, I check and update the record in B accordingly.

I thought my approach would be simple as;

  1. Read Person from A datasource
  2. Read Person from B datasource
  3. (Those two Person can be having different entities)
  4. Compare and find the ones to Add,Update,Delete.
  5. Update the database.

However since I am pretty newbie to Spring Batch, the implementation is kind of ending up to a spaggetti code which I don't want and want to learn the right way for it.

So; I created this job below

@Bean
public Job job() {
    return jobBuilderFactory
            .get("myNewbieJob")
            .start(populateARepository())
            .next(populateBRepository())
            .next(compareAndSubmitCountryRepositoriesTasklet())
            .build();
}

To explain; populateARepository() populateARepository() : I have a Repository object just contains a list. This step just does add records to the list.

The part that I don't like is that compareAndSubmitCountryRepositoriesTasklet() is basically comparing those repositories... and then I don't know what to do.

  • If I create a DB access and push from that class, I won't like it, because I just wanted it to be a step where I find the differences.
  • If I create another class which contains 3 separate lists for toUpdate,toDelete,toInsert, and then in the next step somehow use that repository... that sounded wrong to me as well.

So, here I am. Any kind of guidance is appreciated. How would you deal in this situation?

Thank you in advance.

Bleach
  • 309
  • 3
  • 18
  • 2
    Sounds like two jobs - job 1 could handle *A Does contain the record that B does not, I have to add B* and *A does contain, B does contain, I check and update the record in B accordingly*, while job 2 could handle *A Does not contain the record that B does, I have to delete from B*. – Andrew S Mar 31 '21 at 17:55
  • I understand. However what'd your approach would be to that? Creating a repo that contains 3 lists inside for toUpdate, toDelete, toInsert and then populate that repository? That's my silly approach... Do you have anything else on mind? – Bleach Apr 01 '21 at 07:10
  • Oh wait... I think what you're saying it, since I have both repos loaded, do 2 jobs, first for the updates from A side, second for the delete (so compare from B side...) Did I get it right? – Bleach Apr 01 '21 at 07:36

1 Answers1

0

Before talking about Spring Batch, I would first look for an algorithm to solve this problem. If I understand correctly, you basically need to replicate the same state of records in database A into database B. What you can do is:

  • Read Person items from database A
  • Use an item processor to do the comparison with table B. Here, you would mark the item accordingly to be inserted, updated or deleted
  • Use an item writer that checks the type of record and do the necessary operation. Here, you can create a custom writer or use a ClassifierCompositeItemWriter (see this example)

This approach works well with small/medium datasets but not for large datasets (due to the additional query for each item, but this is inherent to the algorithm itself and not the implementation with Spring Batch).

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • Hi, thank you for the answer. However; little things. 1. I think I need to load the data from both of the databases to be able to compare full data. At least one of them. For 2 reasons; a. For instance X person wouldn't have been loaded while im trying to compare X person from database A to database B. Second reason; I wouldn't know X should be deleted if I don't load all the data and compare if it should be gone or not. – Bleach Apr 01 '21 at 09:25
  • I don't think so. You said `I have full trust in A datasource`, meaning database A is the source of truth. So in this case, you either read items from db A and compare them to B, or read items from B and compare them to A. There is no need to `load the data from both of the databases to be able to compare full data`. – Mahmoud Ben Hassine Apr 01 '21 at 09:30
  • I see. All records shall be the same from database A(so yes, full trust on DB-A). I am not sure if we are at the same page for "comparing" part. though. Only way I know is that loading them both and then compare. Because, imagine while reading B, there shall be data from A to compare against. Same goes with the other way around. Or am I wrong here? – Bleach Apr 01 '21 at 09:34
  • My answer suggests to read data from A as it is the source of truth. I see no need to load B entirely in memory. Comparing/Synchronizing two tables is a general problem (see https://stackoverflow.com/questions/37818784/how-to-synchronize-data-between-two-tables-in-different-databases-mysql-runnin), that's why I said you need to come up with a solution first (an algorithm) to the problem before asking for Spring Batch specifics. Spring Batch provides two types of tasks (simple tasklet and chunk-oriented tasklet), so it is up to you to break down the problem into steps using one of these types. – Mahmoud Ben Hassine Apr 01 '21 at 09:42