What to do when the SQL index does not solve the problem of performance?

Question

This query ONE

SELECT * FROM TEST_RANDOM WHERE EMPNO >= '236400' AND EMPNO <= '456000';

in the Oracle Database is running with cost 1927.

And this query TWO :

SELECT * FROM TEST_RANDOM WHERE EMPNO = '236400';

is running with cost 1924.

This table TEST_RANDOM has 1.000.000 rows, I created this table so:

Create table test_normal (empno varchar2(10), ename varchar2(30), sal number(10), faixa varchar2(10));

Begin 
    For i in 1..1000000 
    Loop 
        Insert into test_normal values(
            to_char(i), dbms_random.string('U',30), 
            dbms_random.value(1000,7000), 'ND'
        ); 
        If mod(i, **10000)** = 0 then 
            Commit; 
        End if; 
    End loop; 
End; 

Create table test_random 
as
select /*+ append */ * from test_normal order by dbms_random.random;

I created a B-Tree index in the field EMPNO so:

CREATE INDEX IDX_RANDOM_1 ON TEST_RANDOM (EMPNO);

After this, the query TWO improved, and the cost changed to 4.

But the query ONE did not improve, because Oracle Database ignored it, for some reason Oracle Database understood that this query is not worth it to use the plan execution with the index...

My question is: What could we do to improve this query ONE performance? Because the solution of the index did not solve and its cost continues to be expensive...

What is the type of `EMPNO`? Your query suggests it's a string, when what you're really after is an integer. String comparison on a range will not work correctly, which in turn may inhibit use of the index -- for example, `'2364001'` meets your condition. Try leaving out the single quotes for a start. — Jeroen Mostert, Jan 23 '23 at 14:16
Please **[edit]** your question and add the [execution plan](https://www.postgresql.org/docs/current/static/using-explain.html) generated using **`explain (analyze, buffers, format text)`** (_not_ just a "simple" explain) as [formatted text](http://stackoverflow.com/help/formatting) and make sure you preserve the indention of the plan. Paste the text, then put `\`\`\`` on the line before the plan and on a line after the plan. Please also include complete `create index` statements for all indexes as well. — , Jan 23 '23 at 14:22
@JeroenMostert the type of ``EMPNO`` is varchar2, I go test leaving out the single quotes, but assuming it doesn't work, what does the community generally do to resolve performance issues when indexes are ignored? — Arthur, Jan 23 '23 at 14:33
Well there you go then. Even if you index that, asking the database to give you values within a string "range" will not give you correct results, because it necessarily uses string comparisons, where things like `'2' > '10'` and `'a' > '1'` hold. Even if all your values should happen to be of constant length and contain only digits, the engine can't assume this to be the case for generating index access based on the type alone. Either change the data type, or index a generated column, or access your rows differently (if `EMPNO` really could be an arbitrary string). — Jeroen Mostert, Jan 23 '23 at 14:39
Also -- `varchar2` is not a type that is used in Postgres. It is (rather infamously) one in Oracle, though. Double check you've got the right engine. The same general advice applies, in that no engine will compare string ranges the same way number ranges are compared (since that would generally be impossible). — Jeroen Mostert, Jan 23 '23 at 14:45

Jon Heller · Accepted Answer · 2023-01-24T14:33:34.240

For this query, Oracle does not use an index because the optimizer correctly estimated the number of rows and correctly decided that a full table scan would be faster or more efficient.

B-Tree indexes are generally only useful when they can be used to return a small percentage of rows, and your first query returns about 25% of the rows. It's hard to say what the ideal percentage of rows is, but 25% is almost always too large. On my system, the execution plan changes from full table scan to index range scan when the query returns 1723 rows - but that number will likely be different for you.

There are several reasons why full table scans are better than indexes for retrieving a large percentage of rows:

Single-block versus multi-block: In Oracle, like in almost all computer systems, it can be significantly faster to retrieve multiple chunks of data at a time (sequential access) instead of retrieving one random chunk of data at a time (random access).
Clustering factor: Oracle stores all rows in blocks, which are usually 8KB large and are analogous to pages. If the index is very inefficient, like if the index is built on randomly sorted data and two sequential reads rarely read from the same block, then reading 25% of all the rows from an index may still require reading 100% of the table blocks.
Algorithmic complexity: A full table scan reads the data as a simple heap, which is O(N). A single index access is much faster, at O(LOG(N)). But as the number of index accesses increases, the benefit wears off, until eventually using the index is O(N * LOG(N)).

Some things you can do to improve performance without indexes:

Partitioning: Partitioning is the idea solution for retrieving a large percentage of data from a table (but the option must be licensed). With partitioning, Oracle splits the logical table into multiple physical tables, and the query can only read from the required partitions. This can create the benefit of multi-block reads, but still limits the amount of data scanned.
Parallelism: Make Oracle work harder instead of smarter. But parallelism probably isn't worth the trouble for such a small table.
Materialized views: Create tables that only store exactly what you need.
Ordering the data: Improve the index clustering factor by sorting the table data by the relevant column instead of doing it randomly. In your case, replace order by dbms_random.random with order by empno. Depending on your version and platform, you may be able to use a materialized zone map to keep the table sorted.
Compression: Shrink the table to make it faster to read the whole thing.

That's quite a lot of information for what is possibly a minor performance problem. Before you go down this rabbit hole, it might be worth asking if you actually have an important performance problem as measured by a clock or by resource consumption, or are you just fishing for performance problems by looking at the somewhat meaningless cost metric?

thank you for your great answer, I am just studying to understand how to improve the performance of the SQL tables — Arthur, Jan 24 '23 at 13:42
So, I have a question about the topic "Ordering the data", because you say to create the index for a relevant column instead of creating an index for a random column, do you are talking about my case? — Arthur, Jan 24 '23 at 14:08
Because I created an index for a relevant column, which is a column that I use in my WHERE, does not make sense to create an index for a column that does not is used in my WHERE. — Arthur, Jan 24 '23 at 14:08
I learned this in this comment: https://stackoverflow.com/a/8937872/12253990 — Arthur, Jan 24 '23 at 14:08
@Arthur Yes, the index clustering factor and ordering the data does apply to your case. The issue is complicated because we normally wouldn't expect the order of table data to matter to an index. I modified my answer with some details and a link to the manual's example of how ordering the data can improve the index's clustering factor. — Jon Heller, Jan 24 '23 at 14:35

What to do when the SQL index does not solve the problem of performance?

1 Answers1