Speeding up Queries that Use Aggregate Functions by Using Indexes

Question

According to the textbook, Oracle Database 12c The Complete Reference by Bob Bryla and Kevin Loney, "In addition to supporting WHERE clauses and joins, indexes also support ORDER BY clauses and the MAX and MIN functions." They later go on to say, "If you select eh MAX or MIN value of an indexed column, the optimizer may use the index to quickly find the maximum or minimum value for the column."

To clarify these quotes, let's assume that there is not an index on the salary column of a table called EMPS. Would the following queries be benefited if there were an index on the salary column:

select min(salary)
from emps;

select department_id, min(salary)
from emps
group by department_id;

EDIT: After analyzing the execution plans of both of these queries, the first query appears to utilize the index whereas the second query does not.

You can answer this yourself by viewing the [execution plan](https://stackoverflow.com/q/9269042/62576) — Ken White, Jun 14 '18 at 18:50
you could create a working example and test the explain plans. The first one should use the index, the second probably not. — tbone, Jun 14 '18 at 18:57
The second one might use an index on `department_id`, but having `salary` as the first key of an index doesn't help the query at all. — Gordon Linoff, Jun 14 '18 at 19:16
If emps had a bunch of columns besides salary and department_id then an index on department_id and salary might help in some sort of full index scan because the index would be smaller than the table. But, I think the main point is the first query would be very fast with an index on salary. — Bobby Durrett, Dec 19 '22 at 23:23

score 2 · Accepted Answer · answered Jun 16 '18 at 22:10

There are at least three different ways indexes can improve aggregate query performance: INDEX FULL SCAN (MIN/MAX), INDEX FAST FULL SCAN, and INDEX FULL SCAN / SORT GROUP BY NOSORT.

Index Structure

While thinking about the different index algorithms it may help to keep in mind this image of the internal structure of a B-tree index:

(Although it's not a perfect match for this answer since it only shows a single-column index. You can think of a multi-column index as a B-tree inside a B-tree.)

Sample Schema

drop table emps;
create table emps
(
    id            number not null,
    name          varchar2(100) not null,
    department_id number not null,
    salary        number not null
);
create index emps_sal_idx on emps(salary);
create index emps_dept_and_sal_idx on emps(department_id, salary);

insert into emps
select
    level id,
    'John Smith '||level name,
    mod(level, 100) department_id,
    50000 + dbms_random.value * 100000 salary
from dual
connect by level <= 100000;
begin
    dbms_stats.gather_table_stats(user, 'EMPS');
end;
/

Some important things to note about this table, index, and data:

The columns are set to not null. Otherwise Oracle may not be able to use some of the indexes, since indexes do not always store nulls)
There is an index just on salary and a multi-column index on department_id and salary
The table data is a bit skewed. The salary values are relatively unique but there are only 100 department_id values.

INDEX FULL SCAN (MIN/MAX)

When the book says that indexes support MAX and MIN functions it is probably referring to the INDEX FULL SCAN (MIN/MAX) operation. Look at the way the data is sorted and organized in the above picture. Finding the first or last value is a very simple operation; follow the left-hand side multiple times to find the minimum, or follow the right-hand side multiple times to find the maximum.

explain plan for
select min(salary)
from emps;

select * from table(dbms_xplan.display);

Plan hash value: 293008466

-------------------------------------------------------------------------------------------
| Id  | Operation                  | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |              |     1 |    22 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |              |     1 |    22 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| EMPS_SAL_IDX |     1 |    22 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

But sometimes there is a problem. Oracle can get either the MIN or the MAX very quickly, but for some reason it stumbles when it has to find both values. It seems like it should be a simple enough feature to traverse both the left-hand side and the right-hand side, but instead Oracle has to read the whole index in these situations.

explain plan for
select min(salary), max(salary)
from emps;

select * from table(dbms_xplan.display);

Plan hash value: 2011972768

--------------------------------------------------------------------------------------
| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |              |     1 |    22 |   162   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE       |              |     1 |    22 |            |          |
|   2 |   INDEX FAST FULL SCAN| EMPS_SAL_IDX |   100K|  2148K|   162   (1)| 00:00:01 |
--------------------------------------------------------------------------------------

INDEX FAST FULL SCAN

For the second query, with multiple columns, a multi-column index can help. The index contains all of the data, and Oracle can read directly from the smaller index instead of reading the entire large table. This is the INDEX FAST FULL SCAN operation.

explain plan for
select department_id, min(salary)
from emps
group by department_id;

select * from table(dbms_xplan.display);

Plan hash value: 104291853

-----------------------------------------------------------------------------------------------
| Id  | Operation             | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                       |   100 |  2500 |   170   (3)| 00:00:01 |
|   1 |  HASH GROUP BY        |                       |   100 |  2500 |   170   (3)| 00:00:01 |
|   2 |   INDEX FAST FULL SCAN| EMPS_DEPT_AND_SAL_IDX |   100K|  2441K|   166   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------

INDEX FULL SCAN / SORT GROUP BY NOSORT

For the second query, aren't we basically trying to run a MIN/MAX scan multiple times, for each department? Oracle doesn't have that exact operation, but it has something close with a combination of INDEX FULL SCAN followed by a SORT GROUP BY NOSORT. If Oracle reads the index in order, it doesn't have to sort anything and doesn't need to write any temporary space to find the min and max. The min and max will be easy to find for each department.

explain plan for
select /*+ index(emps) */ department_id, min(salary)
from emps
group by department_id;

select * from table(dbms_xplan.display);

Plan hash value: 445731427

----------------------------------------------------------------------------------------------
| Id  | Operation            | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                       |   100 |  2500 |   607   (1)| 00:00:01 |
|   1 |  SORT GROUP BY NOSORT|                       |   100 |  2500 |   607   (1)| 00:00:01 |
|   2 |   INDEX FULL SCAN    | EMPS_DEPT_AND_SAL_IDX |   100K|  2441K|   607   (1)| 00:00:01 |
----------------------------------------------------------------------------------------------

(I needed a hint to get the above plan to work. I'm not exactly sure why, for some reason Oracle think it's not as good as other ways. And it may be slower. Since full table scans and fast full index scans can use multi-block reads, whereas index range and full scans use single-block reads, it's sometimes faster to do things the hard way than the smart way.)

Other?

There are probably other clever ways to improve performance using in-memory (data stored in columns), list interval partitions with local indexes (maybe then it could use a min/max scan for each index partition?), or some other feature I haven't thought of. The lesson here is that Oracle almost always has a way to speed up any query, although there is always a cost associated with creating these separate objects.

Nice job. I tend to do tests like this to see the actual plan and execution details instead of that the database thinks it will be: SELECT /*+gather_plan_statistics*/ min(salary) from emps; select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS')); — Bobby Durrett, Dec 19 '22 at 23:25