I have a postgres database with multiple schemas, each schemas with multiple tables, each tables with few millions rows.
From time to time, I have to download a file containing a few millions lines and compare each line with every row in all tables, if the row is find I must update a column in It.
The first thing I tried was reading the file line by line, running a select query on each table in each schema, and if the rows is found i run the update, it works good on my testing platform, but with the actual database, it will be running forever since it's executing around 1000 queries per second (I checked out with the query: SELECT xact_commit+xact_rollback FROM pg_stat_database WHERE datname = 'mailtng_lists';
).
The second thing I tried, is separate the main script from the script that connects to the database, so what I did was splitting the huge file in chunks, each chunk with 100K lines, and then calling the script that will do the connection X times with the following command:
foreach($chunks as $chunk) //$chunks is the result of the split command (no problem here)
{
exec("updater.php $chunk");
}
But it did no improvements at all, number of queries per second was still very low, so the last thing I tried, was doing the same but with shell_exec so the script wouldn't have to wait for the output, but the server crashed since I had 173 chunks so It resulted in calling 173 php instances.
Any idea on how to handle this situation?