For those opting that is is not a programming question, it ultimately is as I have to 1) code, 2) the amount of algorithms of comparing data for each row in option #1. I am not asking how to do queries or how my database should be setup.
I have run into a bit of a pickle. I have a database which follows Google GTFS specs and I am now writing an automatic update program to service this database.
The database gets an overhaul once every 3 months. The table with the least amount of rows consists of between 1-10 rows and the largest table contains 7 million rows. The rest have between 10 000 and 80 000.
The files my program to download are .txt files which translate in a table. In other words: stops.txt = database.stops. Database is InnoDB type.
I have come up with 2 solutions.
1) every row id in the several .txt documents is to be compared to what is currently in the database, if nothing has changed do nothing, if something has changed, do an UPDATE.
2) INSERT the downloaded files into their own tables (basically mirroring the live tables) and then switch table names. Example: database.stop_new and database.stop switch names.
Another twist: The overhaul will be done at a certain date defined in one of the .txt files, this means it could be done a Saturday on a peak holiday prehaps, meaning that users could be sending queries at all times.
Question: Which approach should I go for to endure that nothing breaks and that the user experience is upheld. (I am leaning towards option 2...)