-1

I have some application which opens a transaction and runs a stored procedure. In the stored procedure there are some "insert into X select Y" statements. In some scenarios the stored procedure takes more than an hour to complete while taking 100% and doing no IO most of the time.

When I try to run the stored procedure from psql on the same database / data it takes a minute.

Usually I use explain analyze to understand the bottlenecks but since it runs as expected from psql I'm having a problem to understand the problem here.

Is there a way to understand better why is the statement stuck? Perhaps enabling some logging which will explain the inner work of the back-end?
Server: PostgreSQL 9.2.4

Below is a stack trace of postgres during this CPU loop:

#0  0x080931d3 in slot_getattr ()
#1  0x081d3375 in ExecMakeFunctionResultNoSets ()
#2  0x081d382b in ExecQual ()
#3  0x081e4b3d in ExecNestLoop ()
#4  0x081cdbbd in ExecProcNode ()
#5  0x081e4a57 in ExecNestLoop ()
#6  0x081cdbbd in ExecProcNode ()
#7  0x081e3552 in ExecModifyTable ()
#8  0x081cdae4 in ExecProcNode ()
#9  0x081ccd71 in standard_ExecutorRun ()
#10 0xf78c5cf9 in pgss_ExecutorRun () from /opt/CPshrd-    R80/database/postgresql/lib/postgresql/pg_stat_statements.so
#11 0x081ee9d5 in _SPI_execute_plan ()
#12 0x081eed1a in SPI_execute_plan_with_paramlist ()
#13 0x4db7f43d in exec_stmt_execsql () from /opt/CPshrd-    R80/database/postgresql/lib/postgresql/plpgsql.so
#14 0x4db8169a in exec_stmt () from /opt/CPshrd-R80/database/postgresql/lib/postgresql/plpgsql.so
#15 0x4db825f3 in exec_stmts () from /opt/CPshrd-R80/database/postgresql/lib/postgresql/plpgsql.so
#16 0x4db81279 in exec_stmt_block () from /opt/CPshrd-    R80/database/postgresql/lib/postgresql/plpgsql.so
#17 0x4db8454b in plpgsql_exec_function () from /opt/CPshrd-R80/database/postgresql/lib/postgresql/plpgsql.so
#18 0x4db77ed4 in plpgsql_call_handler () from /opt/CPshrd-R80/database/postgresql/lib/postgresql/plpgsql.so
#19 0x081d193f in ExecMakeFunctionResult ()
#20 0x081ce2f5 in ExecProject ()
#21 0x081e563d in ExecResult ()
#22 0x081cdad8 in ExecProcNode ()
#23 0x081ccd71 in standard_ExecutorRun ()
#24 0xf78c5cf9 in pgss_ExecutorRun () from /opt/CPshrd-    R80/database/postgresql/lib/postgresql/pg_stat_statements.so
#25 0x081ee9d5 in _SPI_execute_plan ()
#26 0x081eed1a in SPI_execute_plan_with_paramlist ()
#27 0x4db7d69c in exec_run_select () from /opt/CPshrd-R80/database/postgresql/lib/postgresql/plpgsql.so
#28 0x4db80b2a in exec_stmt_perform () from /opt/CPshrd-R80/database/postgresql/lib/postgresql/plpgsql.so
#29 0x4db81861 in exec_stmt () from /opt/CPshrd-R80/database/postgresql/lib/postgresql/plpgsql.so
#30 0x4db825f3 in exec_stmts () from /opt/CPshrd-R80/database/postgresql/lib/postgresql/plpgsql.so
#31 0x4db81279 in exec_stmt_block () from /opt/CPshrd-R80/database/postgresql/lib/postgresql/plpgsql.so
#32 0x4db8454b in plpgsql_exec_function () from /opt/CPshrd-R80/database/postgresql/lib/postgresql/plpgsql.so
#33 0x4db77ed4 in plpgsql_call_handler () from /opt/CPshrd-R80/database/postgresql/lib/postgresql/plpgsql.so
#34 0x081d3b49 in ExecMakeTableFunctionResult ()
#35 0x081e4f74 in FunctionNext ()
#36 0x081d520b in ExecScan ()
#37 0x081e4ef1 in ExecFunctionScan ()
#38 0x081cdb70 in ExecProcNode ()
#39 0x081ccd71 in standard_ExecutorRun ()
#40 0xf78c5cf9 in pgss_ExecutorRun () from /opt/CPshrd-    R80/database/postgresql/lib/postgresql/pg_stat_statements.so
#41 0x082a816e in PortalRunSelect ()
#42 0x082a95a8 in PortalRun ()
#43 0x082a67ed in PostgresMain ()
#44 0x08260dfa in ServerLoop ()
#45 0x08261ca7 in PostmasterMain ()
Avner Levy
  • 6,601
  • 9
  • 53
  • 92
  • Did you check if it's maybe waiting for a lock? https://wiki.postgresql.org/wiki/Lock_Monitoring –  Oct 18 '15 at 12:28
  • Checked, no locks. And I assume it wouldn't take 100% CPU if it is stuck on a lock. – Avner Levy Oct 18 '15 at 12:29
  • 1
    You can install the auto-explain module: http://www.postgresql.org/docs/current/static/auto-explain.html I also think, you should post this to the Postgres mailing list: http://www.postgresql.org/list/ –  Oct 18 '15 at 12:30
  • My initial guess would be that "Y" is a view or complex query, and/or the table has triggers. – Gordon Linoff Oct 18 '15 at 12:32
  • Your guess is correct, but wouldn't you expect it to run the same from application and psql? @Gordon Linoff – Avner Levy Oct 18 '15 at 12:35
  • Is the UDF using the same parameters, and looking at the same data? The strack trace does not help much, please follow the advise from @a_horse_with_no_name and use the auto explain module. This should log the query plan into the logfile. – A. Scherbaum Oct 18 '15 at 18:22
  • The parameters and data are exactly the same and both are the single connected client to the database. Tried auto explain, but it doesn't work on stored procedures (it prints some summary on the stored procedure instead of internal selects). @A. Scherbaum – Avner Levy Oct 18 '15 at 18:32
  • I marked it as duplicate, as you can find explanation, links to the manual, instructions to diagnose and the same solution as you found yourself (for the same problem: missing / outdated statistics). Also note that temporary tables need manual vacuum / analyze: http://dba.stackexchange.com/a/18694/3684 – Erwin Brandstetter Oct 19 '15 at 07:12

1 Answers1

1

In my case, the issue was that the stored procedure created temporary tables and filled them with data.
After adding ANALYZE commands after filling them it works now fast in all scenarios. I guess the database decision regarding when to collect statistics data through autovacuum daemon can't be depended on in scenarios where you change significantly tables and query right after or use temp tables.

Avner Levy
  • 6,601
  • 9
  • 53
  • 92
  • 1
    [From the manual](http://www.postgresql.org/docs/current/static/routine-vacuuming.html#AUTOVACUUM): "*Temporary tables cannot be accessed by autovacuum. Therefore, appropriate vacuum and analyze operations should be performed via session SQL commands*" –  Oct 19 '15 at 06:59