59

So I keep hearing that PreparedStatements are good for performance.

We have a Java application in which we use the regular 'Statement' more than we use the 'PreparedStatement'. While trying to move towards using more PreparedStatements, I am trying to get a more thorough understanding of how PreparedStatements work - on the client side and the server side.

So if we have some typical CRUD operations and update an object repeatedly in the application, does it help to use a PS? I understand that we will have to close the PS every time otherwise it will result in a cursor leak.

So how does it help with performance? Does the driver cache the precompiled statement and give me a copy the next time I do connection.prepareStatement? Or does the DB server help?

I understand the argument about the security benefits of PreparedStatements and I appreciate the answers below which emphasize it. However I really want to keep this discussion focused on the performance benefits of PreparedStatements.

Update: When I say update data, I really mean more in terms of that method randomly being called several times. I understand the advantage in the answer offered below which asks to re-use the statement inside a loop.

    // some code blah blah
    update();

    // some more code blah blah 
    update();

.... 

public void update () throws SQLException{
 try{
      PreparedStatement ps = connection.prepareStatement("some sql");
      ps.setString(1, "foobar1");
      ps.setString(2, "foobar2");
      ps.execute();
 }finally {
     ps.close();

 }

}

There is no way to actually reuse the 'ps' java object and I understand that the actual connection.prepareStatement call is quite expensive.

Which is what brings me back to the original question. Is this "some sql" PreparedStatement still being cached and reused under the covers that I dont know about?

I should also mention that we support several databases.

Thanks in advance.

Kapsh
  • 20,751
  • 13
  • 36
  • 44
  • 3
    If you're not using PrepardStatements and have no fear of SQL injection attacks, then you don't know enough about SQL injection attacks. – Joachim Sauer Mar 26 '09 at 21:20
  • 1
    not only sql injection, but reduced errors of passing types as strings, escaping special characters, etc. Also, you close the PS when you're done using it, just like everything else, which is generally at the end of a "batch" of updates. cursor leaks are often from leaving a ResultSet open. – John Gardner Mar 26 '09 at 22:25
  • 6
    +1 good question, and several good answers. For Oracle, the big "performance" benefit of prepared statements comes on database server, not one a single statement execution, but on the repeated execution of IDENTICAL SQL text. You aren't going to see any performance boost on the single execution of a new statement. The savings in CPU cycles comes when that statement is executed a second time, a third time. Oracle skips all of the heavy lifting of the "hard parse", and reuses the previously prepared query plan. – spencer7593 Jun 02 '09 at 20:29
  • 1
    Geeze, 5 years later and it feels like there still isn't a good answer out here. In particular, I think an answer to the part about whether you get any performance benefit when preparing the identical query over and over using different connections is key here. I'll offer up a bounty for a good, in-depth answer to this one. – jakejgordon Dec 05 '14 at 16:50
  • 1
    You say in the question: "There is no way to actually reuse the 'ps' java object". Why not? This is exactly how you will achieve the performance gain. – isak gilbert Dec 11 '14 at 01:05

11 Answers11

36

The notion that prepared statements are primarily about performance is something of a misconception, although it's quite a common one.

Another poster mentioned that he noted a speed improvement of about 20% in Oracle and SQL Server. I've noted a similar figure with MySQL. It turns out that parsing the query just isn't such a significant part of the work involved. On a very busy database system, it's also not clear that query parsing will affect overall throughput: overall, it'll probably just be using up CPU time that would otherwise be idle while data was coming back from the disk.

So as a reason for using prepared statements, the protection against SQL injection attacks far outweighs the performance improvement. And if you're not worried about SQL injection attacks, you probably should be...

Neil Coffey
  • 21,615
  • 7
  • 62
  • 83
  • I was going add the SQL Injection protection, but instead I'll +1 you! – Scott Stanchfield Mar 27 '09 at 16:45
  • 3
    "it'll probably just be using up CPU time that would otherwise be idle while data was coming back from the disk" - That depends on the system involved and the queries. I have used database systems with modest CPU resources but massive parallel disk capacity. Arguably the system was badly designed, but that system was definitely CPU-bound, so things we could do to improve CPU utilization (such as fewer query parses) did make a difference. – Adrian Smith Jan 18 '11 at 14:19
  • 3
    I'm not convinced that sql performance improvement is only %20. Where are the benchmarks for that? But your point about sql injection is a good one. Still I think it's misleading to say that performance improvement is only %20 – dan b Dec 18 '14 at 10:12
  • @danb - Speaking in 2009, that was the approximate speedup that I noted in real-life applications. If in 2014 you have some experiences or figures that contradict this, why not share them? – Neil Coffey Dec 18 '14 at 20:10
  • @NeilCoffey What you say may be true, but also quite surprising, as I would expect even more gains.My figures would also be anecdotal. The topic is an interesting one and I do appreciate your input. I think a proper benchmark would need to consider queries whose execution plans are complex, and also remember to submit the queries that differ only by the binding variables. For example, select * from person where first_name = 'dan' and last name='smith', then run for millions of first names and last names. Here's a good question: How do design a benchmark for prepared statements? – dan b Dec 18 '14 at 20:51
  • Actually, I'm not sure I'd expect the performance gain to be that high on average. You seem to be assuming that the execution process is: (a) parse the query string, then (b) start from scratch and slowly work out the execution plan. Whereas in the case of your millions of ostensibly identical queries, it's probably more like: (a) parse the query string; (b) use some quick heuristics to determine which already-worked-out query plan to use. I'd say if this is taking much more than a quarter of the execution time, the DB engine needs improving... – Neil Coffey Dec 19 '14 at 23:03
  • 1
    On the other hand, prepared statements are probably never going to be slower than raw queries and they add extra security, so most of the time, it's hard to think why you wouldn't just use them anyway, whatever the precise performance gain turns out to be. – Neil Coffey Dec 20 '14 at 00:35
  • @NeilCoffey are these Cache available for use across other connections/sessions as well? Also, how are these cache cleaned, specially for execution plant part? After all the table structure (rows, column carnality, indexes etc.) may change significantly over time right? – sactiw May 13 '17 at 11:57
  • 1
    @sactiw These are details of the particular database engine/version of course, but e.g. you can read about some of the criteria used by recent versions of SQL Server here: https://msdn.microsoft.com/en-us/library/cc293624.aspx As with any cached resource, the decision to remove a given plan from the cache is essentially a combination of factors: the need to free up space, how useful the cached plan is, and how much effort was required to create that plan in the first place. In answer to your first question, they don't appear to be connection specific. – Neil Coffey May 14 '17 at 16:55
30

Prepared statements can improve performance when re-using the same statement that you prepared:

PreparedStatement ps = connection.prepare("SOME SQL");

for (Data data : dataList) {
  ps.setInt(1, data.getId());
  ps.setString(2, data.getValue();
  ps.executeUpdate();
}

ps.close();

This is much faster than creating the statement in the loop.

Some platforms also cache prepared statements so that even if you close them they can be reconstructed more quickly.

However even if the performance were identical you should still use prepared statements to prevent SQL Injection. At my company this is an interview question; get it wrong and we might not hire you.

Mr. Shiny and New 安宇
  • 13,822
  • 6
  • 44
  • 64
  • 2
    They can also hurt performance. If they're generated by an ORM and not reused, some databases such as Oracle will actually run slower than native queries. This is particularly true when using regexp_like or similar operators. Different query plans are sometimes generated. – Lucas Holt Oct 29 '18 at 17:40
  • Benchmarks possible? – rogerdpack Apr 06 '22 at 18:46
20

Prepared statements are indeed cached after their first use, which is what they provide in performance over standard statements. If your statement doesn't change then it's advised to use this method. They are generally stored within a statement cache for alter use.

More info can be found here:

http://www.theserverside.com/tt/articles/article.tss?l=Prepared-Statments

and you might want to look at Spring JDBCTemplate as an alternative to using JDBC directly.

http://static.springframework.org/spring/docs/2.0.x/reference/jdbc.html

Jonathan Holloway
  • 62,090
  • 32
  • 125
  • 150
  • 2
    @Jon: just to point out that with Oracle, on the server side, ALL statements are prepared in the same way. All statements are "cached" and available for reuse. The "trick" is to get the exact same SQL text executed over and over, just executed with different supplied values (bind arguments) – spencer7593 Jun 02 '09 at 20:24
  • Very helpful link especially the second one. – bLaXjack Jul 18 '14 at 06:49
  • Upon closer reading I think the link in this article gives the best explanation. I wish we were seeing these benefits in our application but that's a different topic. Awarding the bounty here for the reference, old as it may be. – jakejgordon Dec 12 '14 at 06:32
8

Parsing the SQL isn't the only thing that's going on. There's validating that the tables and columns do indeed exist, creating a query plan, etc. You pay that once with a PreparedStatement.

Binding to guard against SQL injection is a very good thing, indeed. Not sufficient, IMO. You still should validate input prior to getting to the persistence layer.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • Any half-decent database will cache the general form of the query without a prepared statement/stored proc. – Tom Hawtin - tackline Mar 26 '09 at 23:09
  • 1
    I think it is mostly the parsing that you save. The query plan depends at least in part on the *parameters* to the query, not just its form (though I'd accept that some kind of 'partial plan' may be cached). – Neil Coffey Mar 27 '09 at 01:59
4

So how does it help with performance? Does the driver cache the precompiled statement and give me a copy the next time I do connection.prepareStatement? Or does the DB server help?

I will answer in terms of performance. Others here have already stipulated that PreparedStatements are resilient to SQL injection (blessed advantage).

The application (JDBC Driver) creates the PreparedStatement and passes it to the RDBMS with placeholders (the ?). The RDBMS precompiles, applying query optimization (if needed) of the received PreparedStatement and (in some) generally caches them. During execution of the PreparedStatement, the precompiled PreparedStatement is used, replacing each placeholders with their relevant values and calculated. This is in contrast to Statement which compiles it and executes it directly, the PreparedStatement compiles and optimizes the query only once. Now, this scenario explained above is not an absolute case by ALL JDBC vendors but in essence that's how PreparedStatement are used and operated on.

Buhake Sindi
  • 87,898
  • 29
  • 167
  • 228
3

Which is what brings me back to the original question. Is this "some sql" PreparedStatement still being cached and reused under the covers that I dont know about?

Yes at least with Oracle. Per Oracle® Database JDBC Developer's Guide Implicit Statement Caching (emphasis added),

When you enable implicit Statement caching, JDBC automatically caches the prepared or callable statement when you call the close method of this statement object. The prepared and callable statements are cached and retrieved using standard connection object and statement object methods.

Plain statements are not implicitly cached, because implicit Statement caching uses a SQL string as a key and plain statements are created without a SQL string. Therefore, implicit Statement caching applies only to the OraclePreparedStatement and OracleCallableStatement objects, which are created with a SQL string. You cannot use implicit Statement caching with OracleStatement. When you create an OraclePreparedStatement or OracleCallableStatement, the JDBC driver automatically searches the cache for a matching statement.

Community
  • 1
  • 1
Elliott Frisch
  • 198,278
  • 20
  • 158
  • 249
  • 2
    I think this partially answers the question -- at least for Oracle. The part of the question that doesn't appear to be totally answered is regarding the performance implications. We did just did some tests using JDBC with SQL Server 2005 and saw PreparedStatements consistently taking 50%+ longer than their non-prepared statement counterparts -- even when executing the exact same query repetitively! Pretty scary if you ask me. Another question is whether the JDBC cache depends on the database cache or if they are independent. – jakejgordon Dec 08 '14 at 22:03
  • 1
    Microbenchmarks are notorious, and your JDBC driver might be to blame. Also, you need to be more specific about your implementation. How *exactly* are you comparing the two? – Elliott Frisch Dec 08 '14 at 22:05
  • 1
    I agree and don't really want to put any focus on the microbenchmarking we did (which is probably only legit for our specific case). If we figure out anything here I might blog about the topic and do a deep-dive on the benchmarking. – jakejgordon Dec 08 '14 at 22:08
3

Anecdotally: I did some experiments with prepared vs. dynamic statements using ODBC in Java 1.4 some years ago, with both Oracle and SQL Server back-ends. I found that prepared statements could be as much as 20% faster for certain queries, but there were vendor-specific differences regarding which queries were improved to what extent. (This should not be surprising, really.)

The bottom line is that if you will be re-using the same query repeatedly, prepared statements may help improve performance; but if your performance is bad enough that you need to do something about it immediately, don't count on the use of prepared statements to give you a radical boost. (20% is usually nothing to write home about.)

Your mileage may vary, of course.

Dan Breslau
  • 11,472
  • 2
  • 35
  • 44
  • The performance benchmarks are anecdotally. For complex queries my guess is that the performance gains can be more then that. Also take into account that if you don't use prepared statements the database query cache can fill up quickly. – dan b Dec 18 '14 at 10:14
  • 20% can be massive because what that means is the server is also using fewer resources during that time. Multiply that out, it's pretty meaningful. – trilogy Oct 09 '18 at 19:42
2

1. PreparedStatement allows you to write dynamic and parametric query

By using PreparedStatement in Java you can write parametrized sql queries and send different parameters by using same sql queries which is lot better than creating different queries.

2. PreparedStatement is faster than Statement in Java

One of the major benefits of using PreparedStatement is better performance. PreparedStatement gets pre compiled In database and there access plan is also cached in database, which allows database to execute parametric query written using prepared statement much faster than normal query because it has less work to do. You should always try to use PreparedStatement in production JDBC code to reduce load on database. In order to get performance benefit its worth noting to use only parametrized version of sql query and not with string concatenation

3. PreparedStatement prevents SQL Injection attacks in Java

Read more: http://javarevisited.blogspot.com/2012/03/why-use-preparedstatement-in-java-jdbc.html#ixzz3LejuMnVL

Siva Kumar
  • 1,983
  • 3
  • 14
  • 26
2

Short answer:

PreparedStatement helps performance because typically DB clients perform the same query repetitively, and this makes it possible to do some pre-processing for the initial query to speed up the following repetitive queries.

Long answer:

According to Wikipedia, the typical workflow of using a prepared statement is as follows:

Prepare: The statement template is created by the application and sent to the database management system (DBMS). Certain values are left unspecified, called parameters, placeholders or bind variables (labelled "?" below): INSERT INTO PRODUCT (name, price) VALUES (?, ?)

(Pre-compilation): The DBMS parses, compiles, and performs query optimization on the statement template, and stores the result without executing it.

Execute: At a later time, the application supplies (or binds) values for the parameters, and the DBMS executes the statement (possibly returning a result). The application may execute the statement as many times as it wants with different values. In this example, it might supply 'Bread' for the first parameter and '1.00' for the second parameter.

Prepare:

In JDBC, the "Prepare" step is done by calling java.sql.Connection.prepareStatement(String sql) API. According to its Javadoc:

This method is optimized for handling parametric SQL statements that benefit from precompilation. If the driver supports precompilation, the method prepareStatement will send the statement to the database for precompilation. Some drivers may not support precompilation. In this case, the statement may not be sent to the database until the PreparedStatement object is executed. This has no direct effect on users; however, it does affect which methods throw certain SQLException objects.

Since calling this API may send the SQL statement to database, it is an expensive call typically. Depending on JDBC driver's implementation, if you have the same sql statement template, for better performance, you may have to avoiding calling this API multiple times in client side for the same sql statement template.

Precompilation:

The sent statement template will be pre-compiled on database and cached in db server. The database will probably use the connection and sql statement template as the key, and the pre-compiled query and the computed query plan as value in the cache. Parsing query may need to validate table, columns to be queried, so it could be an expensive operation, and computation of query plan is an expensive operation too.

Execute:

For following queries from the same connection and sql statement template, the pre-compiled query and query plan will be looked up directly from cache by database server without re-computation again.

Conclusion:

From performance perspective, using prepare statement is a two-phase process:

  1. Phase 1, prepare-and-precompilation, this phase is expected to be done once and add some overhead for the performance.
  2. Phase 2, repeated executions of the same query, since phase 1 has some pre processing for the query, if the number of repeating query is large enough, this can save lots of pre-processing effort for the same query.

And if you want to know more details, there are some articles explaining the benefits of PrepareStatement:

  1. http://javarevisited.blogspot.com/2012/03/why-use-preparedstatement-in-java-jdbc.html
  2. http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html
Community
  • 1
  • 1
nybon
  • 8,894
  • 9
  • 59
  • 67
1

Prepared statements have some advantages in terms of performance with respect to normal statements, depending on how you use them. As someone stated before, if you need to execute the same query multiple times with different parameters, you can reuse the prepared statement and pass only the new parameter set. The performance improvement depends on the specific driver and database you are using.

As instance, in terms of database performance, Oracle database caches the execution plan of some queries after each computation (this is not true for all versions and all configuration of Oracle). You can find improvements even if you close a statement and open a new one, because this is done at RDBMS level. This kind of caching is activated only if the two subsequent queries are (char-by-char) the same. This does not holds for normal statements because the parameters are part of the query and produce different SQL strings.

Some other RDBMS can be more "intelligent", but I don't expect they will use complex pattern matching algorithms for caching the execution plans because it would lower performance. You may argue that the computation of the execution plan is only a small part of the query execution. For the general case, I agree, but.. it depends. Keep in mind that, usually, computing an execution plan can be an expensive task, because the rdbms needs to consult off-memory data like statistics (not only Oracle).

However, the argument about caching range from execution-plans to other parts of the extraction process. Giving to the RDBMS multiple times the same query (without going in depth for a particular implementation) helps identifying already computed structures at JDBC (driver) or RDBMS level. If you don't find any particular advantage in performance now, you can't exclude that performance improvement will be implemented in future/alternative versions of the driver/rdbms.

Performance improvements for updates can be obtained by using prepared statements in batch-mode but this is another story.

Nicola Ferraro
  • 4,051
  • 5
  • 28
  • 60
0

Ok finally there is a paper that tests this, and the conclusion is that it doesn't improve performance, and in some cases its slower: https://ieeexplore.ieee.org/document/9854303

PDF: https://www.bib.irb.hr/1205158/download/1205158.Performance_analysis_of_SQL_Prepared_Statements_in_CRUD_operations_final.pdf

John D
  • 11
  • 5