What are the bennefits of prepareThreshold = 5 in pgjdbc?

Question

The prepareThreshold in pgjdbc have the following definition:

Determine the number of PreparedStatement executions required before switching over to use server side prepared statements. The default is five, meaning start using server side prepared statements on the fifth execution of the same PreparedStatement object. More information on server side prepared statements is available in the section called “Server Prepared Statements”.

I wonder what benefit this actually brings us? Most webservers aren't restarted for months, so all database queries will be sent more than 5 times eventually, so give it a week or so and all prepared statements will be stored at the server, won't them? Is this only for benefiting desktop applications? Or am I missing something, like a "5 threshold over a period of time"?

The setting is per connection, and usually with a sane connection pool configuration, individual connections won't live for months (maybe not even hours). And when continuously releasing new features, runtime of an applications itself might be hours or days not months. — Mark Rotteveel, May 22 '19 at 16:46
I see...so the benefit I presume is just to not bloat the server with prepared statements that aren't used often right? Is it here something else? — Michel Feinstein, May 22 '19 at 16:51

score 3 · Accepted Answer · edited May 22 '19 at 19:37

3

I understand that you want to know why the JDBC driver waits at all before using a server side prepared statement.

Without being in on the decision process, I'd say that the reason is that preparing a statement implies a certain overhead (send a Prepare, a Bind and an Execute call). It makes sense to do that only once you can be confident that the statement is going to be reused.

Don't forget that there are other uses for prepared statements that saving the Parse step on multiple executions: it is the king's way to avoiding SQL injection. That alone justifies a prepared statement, even if it is executed only once.

edited May 22 '19 at 19:37

Michel Feinstein

13,416
16
91
173

answered May 22 '19 at 19:20

Laurenz Albe

209,280
17
206
263

Yes, I use it mainly for avoiding SQL Injection. – Michel Feinstein May 22 '19 at 19:39
It's not on my question's topic... But if PostgreSQL used a hash like SHA1 for identifying which prepared statements where already at the server, the whole binding process would be smoother I guess (I am just guessing, since I am not a specialist on this, not even close) – Michel Feinstein May 22 '19 at 19:42
If you read up on prepared statements in PostgreSQL, you'll see that they always have a name, and that isbwhat you have to specify when you want to use it. Even the "unnamed prepared statement" works that way (the name is an empty string), the difference being that you don't have to deallocate it. – Laurenz Albe May 23 '19 at 05:56
So a driver could create a prepared statement with a name as the SHA1 of the query and then bind to any prepared statement already at the server, that another connection created already at past... I wonder why they don't do it, or if they do.. – Michel Feinstein May 23 '19 at 06:09
Your idea would decrease the performance. First, the statement itself would have to be sent to the server every time, which is wasteful if the SQL text is long, then the server would have to calculate the hash every time. It is better if the client says: "prepare this statement and name it 'foo'" and then "now execute 'foo' with these parameters. Why does that bother you? – Laurenz Albe May 23 '19 at 06:14
It doesn't, my point was to send the server a command as "execute statement 'foo'", where foo is the SHA1 of the query text. If there isn't a statement at the server with this name there will be an error and then the client sends the command "prepare this statement and name it 'foo'". This way new connections will reuse statements from older ones. Just an idea, I am not even sure how much this would be beneficial in practice. – Michel Feinstein May 23 '19 at 13:05
It doesn't cost performance? I thought that calculating a SHA hash was CPU intensive. – Laurenz Albe May 23 '19 at 13:47
Well, I think newer CPUs have dedicated hardware for hashes and it's faster than the network traffic for synchronizing prepared statements... But I never benchmarked any of this, it's just a hunch – Michel Feinstein May 24 '19 at 13:59
Nothing keeps you from using that hash as name of the prepared statement... – Laurenz Albe May 24 '19 at 14:04
But the synchronization process would be sending the query all the time, wouldn't it? Maybe I should drop this idea on the postgresql email list – Michel Feinstein May 24 '19 at 14:09
Synchronization process? Why sending the query all the time? I think we are talking past each other. The procedure works like this: prepare statement and name it x - bind parameters to x - execute x - bind parameters to x - execute x etc. I don't see where a hash would make things simpler, and where synchronization comes into that. – Laurenz Albe May 24 '19 at 19:17
Well, as far as I understand, the "prepare" part sends the query to the server with a name, then the server optimizes it and stores it under that name, even if there was another query like this one already at the server, from another connection (past or present), that could be reused. The Hash would prevent the network traffic of resending the same query at each new connection and re-optimizing it at delivery. It's a small issue I think, but maybe it could be desirable for microservices that can't benefit from a connection-pool, thus creating a lot of new connections – Michel Feinstein May 24 '19 at 20:03
Ok, I see the problem. It does not work like that at all. There is no shared cache of statement plans - they are never shared between sessions. Prepared statements are only for one single session. – Laurenz Albe May 24 '19 at 20:05
Exactly, my point is a way to make them be shared between sessions, using a hash to identify each one unically... It's really off-topic, but it came to my mind – Michel Feinstein May 24 '19 at 20:11

What are the bennefits of prepareThreshold = 5 in pgjdbc?

1 Answers1