2

There are two kinds of queries that I ran,

1.A purposely introduced query to perform sorting(order by) in about 10 columns.This uses CPU since sorting is a CPU intensive operation.

The scenario involved running the query which took 30 seconds and ran about 100 of those using simultaneous connections on 100 different tables.CPU usage on a 32 core machine was about 85% on all 32 cores and all 100 queries ran in parallel.

2.Inserting a million rows on a table.

I don't understand why this would consume CPU, since this is purely disk I/O.But I inserted 1 million rows on a single table using 100 simultaneous connections/threads and no indexes where there on those tables,now insert is not the fastest way to load data, but the point here is it is consuming CPU time about 32% on about 10 cores.This is way lesser than the above but still I am just curios.

I could be wrong because of Wal archiving was on and query log was on - does this contribute to CPU.I am assuming no since those are also disk IO.

There was no other process/application running/installed on this machine other than postgres.

Greedy Coder
  • 1,256
  • 1
  • 15
  • 36
  • 2
    Downvoters : Care to leave a comment ? – Greedy Coder Sep 29 '15 at 15:06
  • Why do you use multithread insert? For performance? You can read http://stackoverflow.com/questions/758945/whats-the-fastest-way-to-do-a-bulk-insert-into-postgres – sibnick Sep 29 '15 at 15:21
  • 1
    I don't understand why you'd think that writing data to a database table is a purely disk i/o process. Every change has to be logged, the blocks have to be modified, the changes have to be preserved in case the operation needs to rollback. It's not like just copying data from one file to another. – David Aldridge Sep 29 '15 at 15:35
  • Well my assumption was pure disk I/O , but I kind of doubted myself and hence the question.Of course it is not just copy paste but what would consume CPU more, and is it something that can be tuned/controlled. – Greedy Coder Sep 30 '15 at 05:27

2 Answers2

4

Many different things:

  • CPU time for query planning and the logic in the executor for query execution
  • Transforming text representations of tuples into their on-disk format. Parsing dates, and so on.
  • Log output
  • Processing the transaction logs
  • Writing to shared_buffers when inserting pages to write, scanning shard_buffers for pages to write out
  • Interprocess communication for lock management
  • Scanning through in-memory cached copies of indexes when checking uniqueness, inserting new keys in an index, etc
  • ....

If you really want to know the juicy details, fire up perf with stack traces enabled to see where CPU time is spent.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
1

If your table had a primary key, then it has an implicit index.

It may also be true that if the table had a primary key, then it would be stored as a b-tree and not a simple flat table; I'm not clear on this point since my postgres-fu has weakened over the years, but many DBMSes use the primary key as a default clustering key for a b-tree and just store everything in the b-tree. Managing that b-tree requires plenty of CPU.

Additionally, if you're inserting from 100 threads and connections, then postgres has to perform locking in order to keep internal data structures consistent. Fighting for locks can consume a ton of CPU, and is especially difficult to do efficiently on machines with many CPUs - acquiring a single mutex requires the cooperation of every CPU in the system ala cache coherency protocol.

You may want to experiment with different numbers of threads, while measuring overall runtime and cpu usage - you may find that with, say, 8 threads, the total CPU utilized is 1/10th of your current usage, but still gets the job done within 110-150% of the original time. This would be a sure sign that lock contention is killing your CPU usage.

antiduh
  • 11,853
  • 4
  • 43
  • 66
  • There are no primary key/indexes on the table, so I am guessing that it has to do with the locking part,thanks for the info, ill experiment more and revert back here. – Greedy Coder Sep 29 '15 at 15:18
  • 1
    PostgreSQL does not cluster the table by the primary key. The primary key is supported by a separate index. – David Aldridge Sep 29 '15 at 15:36