0

I have reviewed the similar question See and clear Postgres caches/buffers? , but all of the answers focus on the data buffers, and Postgresql has changed a lot since 2010.

Unlike the OP of that question, I am not looking for consistent behavior when computing performance, I am looking for adaptive behavior as the database changes over time.

In my applicaiton, at the beginning of a job execution, rows in the working tables are empty. Queries run very quickly, but as time goes on performance degrades because the prepared statements are not using ideal access paths (they were prepared when the tables were empty - doh!). Since a typical execution of the job will ultimately cover a few hundred million rows, I need to minimize all of the overheads and periodically run statistics to get the best access paths.

In SQLServer, one can periodically call update statistics and DBCC FreeProccache, and the prepared statements will automatically be re-prepared to use the new access paths.

Edit: FreeProcCache: in SQLServer, prepared statements are implemented as stored procedures. FreeProcCache wipes the compiled stored procedures so that they will be recompiled on the next invocation, and the new access paths come into effect immediately.

Edit: Details of postgresql management of prepared statements: Postgresql defers the prepare until the first call to EXECUTE, and caches the result ofthe prepare after the 5th execution. Once cached, the plan is fixed until the session ends or the prepared statement is freed with DEALLOCATE. Closing JDBC objects does not invoke DEALLOCATE, as an optimization to support open/read/close programming like many web apps display.

Is there a way to force a (Edit)JDBC prepared statement to recompile, (Edit) after running ANALYZE, so it will use the latest statistics?

EDIT: I am using JDBC PreparedStatement to prepare and execute queries against the database and the Postgres JDBC driver.

pojo-guy
  • 966
  • 1
  • 12
  • 39

2 Answers2

0

The way Postgresql updates statistics is via ANALYZE. This is also autoexecuted after a VACUUM run (since VACUUM frees references, and truncates empty pages, I would imagine much like your FreeProccache).

If autovacuum is enabled (the default), ANALYZE will be autorun according to the autovacuum cadence.

You do not need to "recompile" the prepared statement to pick up the new statistics in most cases because it will re-plan during each EXECUTE, and a parameterized prepared statement will replan based on the parameter values and the updated statistics. EDIT: The edge case described is where the query planner has decided to force a "generic plan" because the estimated cost of the specific plan exceeds the cost of such "generic plan" after 5 planned-executions.

Edit: If you do reach this edge case, you can "drop" the prepared statement via DEALLOCATE (and then a re-PREPARE).

You may want to try ANALYZE before EXECUTE, but this will not guarantee a better performance...

cowbert
  • 3,212
  • 2
  • 25
  • 34
  • Yes, I know I will need to run ANALYZE regularly. The question is will the prepared statement be re-prepared after the ANALYZE statement, or is there a way I can force the re-prepare other than to close and re-open it? – pojo-guy Feb 09 '18 at 23:01
  • One reason for using the prepared statement is to eliminate the lookaside to the query optimizer on each call. That is 50 to 80 percent of the overhead of dynamic SQL calls. The prepare phase computes the access path. The statement was initially prepared when the tables were empty, but now they have millions of rows, so the prepared statement is not using the indexes effectively. Running statistics allows the optimizer to pick the best path, but you need to go through the prepare again for that. – pojo-guy Feb 09 '18 at 23:05
  • If you read the documentation for prepare, it specifically states that it replans each execution if the following conditions are true: 1) the prepared query is parameterized 2) the query planner has not yet selected a generic plan (which will only happen after 5 executions where the estimated cost is higher than the generic plan). Then you will need to re-prepare by `DEALLOCATE`. – cowbert Feb 09 '18 at 23:10
  • *That* is useful. I'm not worried about the first 5 executions, I'm worried about the 5,000,000th. Ill have a look at DEALLOCATE – pojo-guy Feb 09 '18 at 23:15
  • Ok, so now the problem is to find the query name as it's known to postgresql. For the record, postgresql defers prepare until EXECUTE, then caches the plan after the 5th execution, until the session is ended or the query is deallocated. Caching persists after the prepared statement is closed to support the open/read/close paradigm of many web apps. – pojo-guy Feb 09 '18 at 23:25
  • Are you not pre-preparing with name the query before execution? If you want such granular control over prepared statements, then you need to prepare a named statement, execute the named statement and deallocate and re-prepare as you need to... – cowbert Feb 12 '18 at 21:13
  • That detail is buried inside the jdbc PreparedStatement implementation, because different RDBMS's have different ways of handling it. DB2, for example, uses a handle (integer number). SQLServer implements as stored procedures (with a handle). – pojo-guy Feb 12 '18 at 21:39
  • You never specified that you were invoking prepared statements via JDBC `PreparedStatement`, which is/may be different from executing `PREPARE` directly, since it is up to the JDBC driver to implement this. You may want to have a look at https://jdbc.postgresql.org/documentation/94/server-prepare.html – cowbert Feb 12 '18 at 21:58
0
  1. Please ensure you really want to re-prepare statements. It might be the case you just want to close DB connection from time to time so statements get prepared "from scratch"

  2. In case you really understand what you are doing (there might be valid reasons like you describe), you can issue DEALLOCATE ALL statement (it is a PostgreSQL-specific statement to deallocate all prepared statements). Recent pgjdbc versions (since 9.4.1210, 2016-09-07) handle that just fine and re-prepare the statements on subsequent use

Vladimir Sitnikov
  • 1,467
  • 12
  • 31