Postgres create/restore taking lot of time on amazon ec2

Question

I have a amazon ec2 instance(SAY S1)(4core-7GB memory) using Ubuntu 12.04, which is running my web app with postgresql 9.1. All the postgres data is stored on to a different ssd volume(not root) of 100 GB. (Write now its currently 26% full only ) .

Suddenly from a day or two few postgres actions started taking lot of time. Create command (52 seconds) and restore a db (9 mins now, previously 50 seconds max ) .

By running iostat while running postgres commands I can confirm that its IOPS of ec2 volume which has reached to its limit (3 IOPS/GB equals 300 IOPS for 100GB volume ) . Can see it in below after running this command iostat -d 5 -x -p xvdf .

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdf              0.35     2.28    1.20  298.99    19.65 13082.19    87.29    23.42   78.03   64.19   78.09   3.29  98.75

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdf              0.00     1.80    0.00  297.40     0.00 13067.20    87.88   126.47  420.75    0.00  420.75   3.35  99.76

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdf              0.00     1.80    0.00  297.40     0.00 13067.20    87.88   126.32  417.95    0.00  417.95   3.35  99.76

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdf              0.00     1.80    0.00  297.80     0.00 13093.60    87.94   131.70  440.82    0.00  440.82   3.36 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdf              0.00     0.00    0.00  301.00     0.00 13225.60    87.88   129.36  422.97    0.00  422.97   3.32  99.84

IO characteristics on aws says that each IOPS takes a request of 256KiB or less so is the postgres using smaller blocks of data to write back resulting more number of IOPS request ?

While I have another ec2 instance(Say S2) with 100GB volume(95% full right now ) with postgres data being on root volume and its performing great. So size of volume is something I am sure doesn't matter here .

The affected volume of S1 only store postgres data still I can see the below stats by iostat. Not sure why the stats are like that and how could I reduce the postgres commands time without increasing the size of the volume . (While all operations 3GB memory is always free )

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdf              0.34     2.29    1.23  298.93    20.10 13079.03    87.28    26.19   87.26   66.96   87.34   3.29  98.78

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdf              0.00     2.40    0.60  299.00     4.80 13020.80    86.95   132.22  434.48  108.00  435.14   3.34 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdf              0.00     3.20    4.40  295.20    43.20 12866.40    86.18   122.18  417.09  142.00  421.20   3.34 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdf              0.00     2.80    2.40  297.20    23.20 12940.00    86.54   122.70  401.11  124.00  403.34   3.34  99.92

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdf              0.00     3.40    4.80  294.80    46.40 12840.00    86.02   127.43  433.15  161.67  437.57   3.34  99.92

Note : The affected volume of postgres contains 100 different postgres db with average size of 110 MB/db (but honestly I don't think this is in any case a problem )

score 0 · Accepted Answer · answered Dec 03 '15 at 05:57

So finally this issue got resolved. And found that it was postgres statistics collector which was running in the background and was issuing lots of small (less then 256 KB) io requests(as we have 100+ dbs) eating out all the 300 IOPS for 100GB disk . Resulting all postgres actions were scheduled in queue and were taking lot of time to process.

Postgres document says

The statistics collector transmits the collected information to backends (including autovacuum) through temporary files. These files are stored in the pg_stat_tmp subdirectory. When the postmaster shuts down, a permanent copy of the statistics data is stored in the global subdirectory. For increased performance, the parameter stats_temp_directory can be pointed at a RAM-based file system, decreasing physical I/O requirements.

I pointed the pg_stats_tmp file to ram instead disk by mounting pg_stats_tmp in tmpfs filesystem . This blog explains how to do it step by step .

Postgres create/restore taking lot of time on amazon ec2

1 Answers1