SQL Query Patterns and Relative Performance

Question

Consider a Table with Primary Key as columnd "id"

It has to be queried for a range of id as [ 1, 2, 3, 4 ...]

There are 2 option -

1)

Select * from Table where id = 1;
Select * from Table where id = 2;
Select * from Table where id = 3;
Select * from Table where id = 4;

2)

Select * from Table where id in ( 1, 2, 3, 4 );

Which of the two is better in performance for Oracle as Databse and using Spring JDBC Template for persistence. Leave aside memory constraint from application point of view assuming that dataset 1, 2, 3, 4 will be within limit for java data structures.

It uses database connection pooling.

I know IN clauses are expensive, so if the range is KNOWN to be sequential, perhaps "select * from Table where ( id >= 1) and (id <= 4)" is better than either of those? — billjamesdev, Mar 02 '13 at 17:40
no, these id are being created by multiple threads and they will not be in order. — fortm, Mar 02 '13 at 17:47
@BillJames The predicates section of an explain plan shows that Oracle converts `IN` to `=`. For the same amount of data, they should perform the same. — Jon Heller, Mar 03 '13 at 00:05

score 5 · Answer 1 · answered Mar 02 '13 at 17:35

5

I rather use the second query because it only queries the database once. Aside from that, you only connect on the database also once.

Remember that database connection is costly (consumes much resources).

answered Mar 02 '13 at 17:35

John Woo

258,903
69
498
492

4

I agree. One single statement is always faster than 4 individual ones (if they retrieve the same total amount of data). – Mar 02 '13 at 17:37
2

I think this fails to take into account anything to do with caching, connection pooling etc. or the fact that IN clauses are much more expensive than a regular equals clause. You may still be right, though, I just don't think it's cut and dried "fewer queries are better than more". – billjamesdev Mar 02 '13 at 17:39
I got confused over this because I read that Spring Batch framework uses a "Driving Query Pattern" where they go the less intuitive way to querying for each Primary key rather in a list. – fortm Mar 02 '13 at 17:41
In my case, database is being served from dbcp pooling and is managed in Spring , so I believe reconnection should not be a bottleneck. – fortm Mar 02 '13 at 17:44
1

@fortm: but re-parsing the query and sending it over to the server can limit scalability. You have much more network roundtrips using four statements than you have with a single statement. – Mar 02 '13 at 18:01
yeah, agree however I am using PreparedStatements so compilation overhead will be less too. – fortm Mar 02 '13 at 18:04

Rob van Wijk · Accepted Answer · 2013-03-03T09:29:15.693

To backup JW's and a_horse_with_no_name's statements, here is a test, all using the same connection, so connection (pooling) issues are disregarded here.

First create the table with a primary key and gather statistics for the optimizer:

SQL> create table mytable
  2  as
  3   select level id
  4        , lpad('*',1000,'*') filler
  5     from dual
  6  connect by level <= 10000
  7  /

Table created.

SQL> alter table mytable add constraint my_pk primary key (id)
  2  /

Table altered.

SQL> exec dbms_stats.gather_table_stats(user,'mytable')

PL/SQL procedure successfully completed.

Next, place all statements in the shared pool, as a warming-up, cutting most of the output of the queries:

SQL> select * from mytable where id = 1
  2  /

        ID
----------
FILLER
---------------------------------------------------------------------------------------------------------------------------------------
         1
***************************************************************************************************************************************
***************************************************************************************************************************************
***************************************************************************************************************************************
***************************************************************************************************************************************
***************************************************************************************************************************************
***************************************************************************************************************************************
***************************************************************************************************************************************
*******************************************************


1 row selected.

SQL> select * from mytable where id = 2
  2  /

1 row selected.

SQL> select * from mytable where id = 3
  2  /

1 row selected.

SQL> select * from mytable where id = 4
  2  /

1 row selected.

SQL> select * from mytable where id in (1,2,3,4)
  2  /    

4 rows selected.

Next, examine the autotrace output, to see the work the database has to do:

SQL> set autotrace traceonly
SQL> select * from mytable where id = 1
  2  /

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3280897506

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     1 |  1005 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MYTABLE |     1 |  1005 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | MY_PK   |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=1)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
       1387  bytes sent via SQL*Net to client
        448  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select * from mytable where id = 2
  2  /

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3280897506

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     1 |  1005 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MYTABLE |     1 |  1005 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | MY_PK   |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
       1387  bytes sent via SQL*Net to client
        448  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select * from mytable where id = 3
  2  /

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3280897506

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     1 |  1005 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MYTABLE |     1 |  1005 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | MY_PK   |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=3)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
       1387  bytes sent via SQL*Net to client
        448  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select * from mytable where id = 4
  2  /

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3280897506

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     1 |  1005 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MYTABLE |     1 |  1005 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | MY_PK   |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=4)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
       1387  bytes sent via SQL*Net to client
        448  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select * from mytable where id in (1,2,3,4)
  2  /

4 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1637292604

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     4 |  4020 |     4   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |         |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| MYTABLE |     4 |  4020 |     4   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | MY_PK   |     4 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID"=1 OR "ID"=2 OR "ID"=3 OR "ID"=4)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
       2435  bytes sent via SQL*Net to client
        448  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

SQL> set autotrace off

Four times 3 consistent gets versus 8 consistent gets. Finally, let's time both variants by executing all statements 10,000 times:

SQL> set timing on
SQL> declare
  2    v_id     number;
  3    v_filler varchar2(1000);
  4  begin
  5    for i in 1 .. 10000
  6    loop
  7      select *
  8        into v_id, v_filler
  9        from mytable
 10       where id = 1
 11      ;
 12      select *
 13        into v_id, v_filler
 14        from mytable
 15       where id = 2
 16      ;
 17      select *
 18        into v_id, v_filler
 19        from mytable
 20       where id = 3
 21      ;
 22      select *
 23        into v_id, v_filler
 24        from mytable
 25       where id = 4
 26      ;
 27    end loop;
 28  end;
 29  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.03
SQL> /

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.00
SQL> /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.99
SQL> declare
  2    v_id     number;
  3    v_filler varchar2(1000);
  4  begin
  5    for i in 1 .. 10000
  6    loop
  7      for r in (select * from mytable where id in (1,2,3,4))
  8      loop
  9        v_id := r.id;
 10        v_filler := r.filler;
 11      end loop;
 12    end loop;
 13  end;
 14  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.41
SQL> /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.39
SQL> /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.39
SQL> set timing off

Here you can see the overhead of sending four times more statements towards the database counts the most. There is some variation possible here and there, depending on index depth and clustering factor, but the one query versus four queries will remain the biggest difference.

So, hopefully, readers of this thread won't say "IN clauses are known to be expensive" anymore :-)

Regards,
Rob.

nice demonstration! `+10` (*hope i could*) – John Woo Mar 03 '13 at 17:03 — John Woo, Mar 03 '13 at 17:03

score 3 · Answer 3 · answered Mar 02 '13 at 18:04

I am going to assume you are using a PreparedStatment for

select * from TABLE where ID=?;

In Oracle, using the PreparedStatement will allow Oracle to cache the execution plan generation for the SQL. So passing ID of 1 or 2 or 1000 will use the same cached Oracle plan.

Oracle will treat

select * from TABLE where ID in (1, 2, 3);

and

select * from TABLE where ID in (4, 8, 15, 16, 23, 42);

as two distinct queries and generate different execution plan for both. Thus type one is preferable from an Oracle standpoint.

In reality, if you place an index on ID, you will not notice a material difference between the two styles.

As always with performance tuning, "Don't guess, do test". Use YourKit or even System.currentTimeMillis() to see the real world numbers. Also do not sacrifice code clarity for negligible performance gains.

Since length of IN clause is going to be fixed, I suppose trick in this link can be used to generate same plans for 2 "IN" SQL statements ? — fortm, Mar 02 '13 at 18:49
http://stackoverflow.com/questions/3107044/preparedstatement-with-list-of-parameters-in-a-in-clause — fortm, Mar 02 '13 at 18:49

score 3 · Answer 4 · answered Mar 02 '13 at 22:44

I think that this question is a good example of premature optimisation and why it should be avoided.

You are trying to get the rows for a number of IDs from a table. The obvious query to use should be the one that meets the requirements in the most simple manner. Unless there is a very good reason, I would always look at using:

Select * from Table where id in ( 1, 2, 3, 4 );

This makes it obvious to later developers what is going on and why. It also leave to the task of optimisation to the database to handle, which is what it is there for.

If after writing this code you find that there are performance problems then it is the time to look at optimisation.

For any sort of code problem the simplest solution is usually the best.

I do realise that I'm a little off-topic here and don't have a lot of knowledge JDBC but do have a lot of Oracle knowledge

score 0 · Answer 5 · answered Mar 03 '13 at 09:52

You might have an option 3...

My question is how did you get those ID-values?

I find the pattern where one has to query a table using a set of unique identifiers, a weird pattern.

More likely you have (a) predicate(s) on other column(s) that identify the exact same set of rows that you want to retrieve.

SQL Query Patterns and Relative Performance

5 Answers5