My company obtains a Microsoft Access database from Cerner Multum, which needs to be diffed against our production backend, which is Sybase (12.0.1.3924). And while I'm aware of off-the-shelf database diff tools (http://www.diffkit.org/, http://www.liquibase.org/), none seems to fit my need - as such, I decided to write a Java tool to perform the work as proof-of-concept.
As it stands, the tool is currently working as designed, and here's the procedure:
- Obtain a list of tables to be diffed from a config file
- Establish a connection to both backends
- Ensure the tables in the config file can be matched against both MS Access and Sybase
- If so, proceed with the diff:
- For each table:
- Obtain a row from MS Access, instantiate an object via reflection
- Iterate over each column in the row, stuffing data into the newly created "Access" POJO
- Using the Access POJO, construct a query for Sybase
- Query Sybase:
- If the result set is NULL, insert a record in Sybase
- If the result set not NULL, instantiate another POJO and stuff Sybase data into it.
- Compare the two POJOs:
- If the POJOs match: do nothing, move on to the next row.
- If the POJOs do not match: perform an update to Sybase using the data from the Access POJO
- For each table:
Now as stated, this is currently getting the job done, albeit in a very procedural, single-threaded manner, and therein lies my question: what is the proper approach to diffing two databases (that happen to be unrelated) in a multi-threaded manner?
I have some experience with multi-threading, but am unsure as to the correct approach as I've never queued inserts/updates. That said, I'm not entirely certain queuing is the proper approach - what about bulk updates/inserts?
Would someone with some experience in this area offer some high level insight as to how to approach this problem? As it stands, I'm churning over 1.5m rows in about 2 hours, which is roughly 200 TPS. Very slow. Any guidance would be greatly appreciated, and I'd to happy to offer additional information if necessary.