0

Q: How to insert missing rows and update exists from data source in a relatively short time?

Exists rarely updated Postgre table and not powerful server (1Gb ram for apache2 and middle CPU). There are 10 M records. Some records can be updated some records can be added.

Data source is big CSV file produced by php script.

Table structure:

id (auto inc.)
week_number (1-54)
audience_id (int)
channel_id (int)
is_weekend (char, Y or N)
start_time (hours, offset in minutes after midnight)
rating (numberic. In fact main value)

Rating field is determined by next fields set: week_number, audience_id, channel_id, is_weekend and start_time Let's call it "complex_key" in a future.

What I did (php server side):

  • loop exists db data via foreach (batch select) and store pairs "complex_key" => rating in a redis cache. Next compare with csv row by row and prepare new set of data. Optimisation: Unset everything I can, optimize arrays e.t.c Result: Failed. Memory limit. (php side, not redis)
  • Use Pg copy to create temp table. The idea was to use DB engine to compare tables and to make needed changes. Something like this. Ofc I added additional field which contained "complex_key" hash and mark it as index. Result: working too slow if we compare 1M+ tables.
Community
  • 1
  • 1
MustDie1Bit
  • 560
  • 4
  • 16
  • 1
    If you use postgres 9.5+ then you should take a look at UPSERT concept ([docs](http://www.postgresql.org/docs/9.5/static/sql-insert.html)) – Ildar Musin Apr 03 '16 at 22:47
  • `MERGE` (aka UPSERT) is the way to go. Did you perform any `ANALYZE` between populating your temp table and updating the target table? This can have a big effect on efficiency. – Philip Couling Apr 04 '16 at 14:22
  • I have 9.0, without ability to upgrade. So upsert and merge not working here. – MustDie1Bit Apr 04 '16 at 23:35

1 Answers1

0

Use PHP to create a SQL file of updates and inserts in a transaction block (BEGIN...COMMIT). Load the file via command line psql. This method won't reduce fetch/computation time, but will create a dramatic increase in speed during update.

Curt Evans
  • 346
  • 2
  • 4