Q: How to insert missing rows and update exists from data source in a relatively short time?
Exists rarely updated Postgre table and not powerful server (1Gb ram for apache2 and middle CPU). There are 10 M records. Some records can be updated some records can be added.
Data source is big CSV file produced by php script.
Table structure:
id (auto inc.)
week_number (1-54)
audience_id (int)
channel_id (int)
is_weekend (char, Y or N)
start_time (hours, offset in minutes after midnight)
rating (numberic. In fact main value)
Rating field is determined by next fields set: week_number, audience_id, channel_id, is_weekend and start_time Let's call it "complex_key" in a future.
What I did (php server side):
- loop exists db data via foreach (batch select) and store pairs "complex_key" => rating in a redis cache. Next compare with csv row by row and prepare new set of data. Optimisation: Unset everything I can, optimize arrays e.t.c Result: Failed. Memory limit. (php side, not redis)
- Use Pg copy to create temp table. The idea was to use DB engine to compare tables and to make needed changes. Something like this. Ofc I added additional field which contained "complex_key" hash and mark it as index. Result: working too slow if we compare 1M+ tables.