5

I suspect this question may be better suited for the Database Administrators site, so LMK if it is and I'll move it. :)

I'm something of a database/Postgres beginner here so help me out. I have a system set up to process 10 things in parallel and write output of those things to the same table in the same Postgres database. The writes happen ok but they take forever. My log files show that I'll have results for 30,000 of these things, but only 7,000 of them are reflected in the database.

I suspect Postgres is queueing up the writes for some reason, and my guess is that this happens because that table has an auto-incrementing primary key. If I'm trying to write 10 records to the same table simultaneously, I would assume they'd have to be queued, because otherwise how is the primary key going to be set?

Do I have this right, or is my database horribly misconfigured? My sysadmin doesn't typically do databases, so if you have any tuning suggestions, even basic stuff, I'd be glad to hear them. :)

dmn
  • 965
  • 3
  • 13
  • 24
  • 2
    Well, it's certainly not the case that autoincrements always have to be queued: the key is that they don't have to be gapless or strictly sequential, so it's possible for the system to "reserve" a key for a row that hasn't been fully written yet. In Postgres, they're implemented as a separate object called a Sequence, separate from the locks and transactions involved on the table itself. – IMSoP Jul 28 '14 at 21:37
  • Can you provide a bit more information about the writes? Are they simple one-record inserts using `INSERT ... VALUES`, or are they batches of data, or results from some other query? And is there just the one standalone table, or are there dependencies being checked or updated? – IMSoP Jul 28 '14 at 21:41
  • They're simple one-record inserts with `INSERT...VALUES`. No dependencies, and while they do require results from another query, those results have already been stored in memory and are just being iterated through. Hmm :/ – dmn Jul 29 '14 at 02:38
  • How many indices on the table? Any triggers? What's the hardware like? If you're inserting 30,000 records, you should probably be doing a bulk batch insert, not individual `INSERT` statements (which _will_ slower, but shouldn't be that match). Need more info here, still.... – Clockwork-Muse Jul 29 '14 at 09:58
  • Sure thing. There's just the one index, the primary key. I have a rule set up on this table that makes sure we aren't inserting a record that is already there. Hardware-wise, we have a gluster/NFS setup, but the database is stored locally on a dedicated server. I think batch inserts should help. Thanks. :) – dmn Jul 29 '14 at 13:55
  • Ooh, sorry, there's also a foreign key dependency on another table... – dmn Jul 29 '14 at 14:13

1 Answers1

7

I suspect Postgres is queueing up the writes for some reason, and my guess is that this happens because that table has an auto-incrementing primary key. If I'm trying to write 10 records to the same table simultaneously, I would assume they'd have to be queued, because otherwise how is the primary key going to be set?

Nope, that's not it.

If you read the documentation on sequences you'll see that they're exempt from transactional visibility and rollback specifically for this reason. An ID generated with nextval is not re-used on rollback.

Do I have this right, or is my database horribly misconfigured? My sysadmin doesn't typically do databases, so if you have any tuning suggestions, even basic stuff, I'd be glad to hear them. :)

It's more likely that you're doing individual commits, one per insert, on a system with really slow fsync()s like a single magnetic hard drive. You might also have your checkpoint intervals too low (see the PostgreSQL logs where warnings about this will appear if so), might have too many indexes causing a slowdown, etc.

You should look at the PostgreSQL logs.

Also, please see the primer I wrote on the topic of improving insert performance.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • WOW this is amazing. Now I have a whole list of stuff to try and make my sysadmin try. :D Thank you so much! Let's see how this goes. – dmn Jul 29 '14 at 14:22
  • You know what actually made the biggest difference for me? Committing less often. I'm using the `psycopg2` Python module which decouples statement execution from committing. It looks like it doesn't matter how many inserts I execute, just how many I commit at a time, and if I commit only once at the end, I win. :D – dmn Jul 29 '14 at 19:28
  • @dmn Sounds like you're on a system with quite slow fsync() then, maybe magnetic hard drives without a RAID controller with BBU and write-back caching. – Craig Ringer Jul 29 '14 at 21:41
  • Yeah, most likely. I'll bug the sysadmin. >:D Another thing that helped even more than committing at the end is using `COPY`. `INSERT` operations that previously took 600 seconds now take 1! Thanks again for your help and for providing tons of useful information. :D – dmn Jul 30 '14 at 15:12