Clustering Factor and Unique Key

Question

Clustering factor - A Awesome Simple Explanation on how it is calculated:

Basically, the CF is calculated by performing a Full Index Scan and looking at the rowid of each index entry. If the table block being referenced differs from that of the previous index entry, the CF is incremented. If the table block being referenced is the same as the previous index entry, the CF is not incremented. So the CF gives an indication of how well ordered the data in the table is in relation to the index entries (which are always sorted and stored in the order of the index entries). The better (lower) the CF, the more efficient it would be to use the index as less table blocks would need to be accessed to retrieve the necessary data via the index.

My Index statistics:

So, here are my indexes(index over just one column) under analysis.

Index starting PK_ is my Primary Key and UI is a Unique key. (Ofcourse both hold unique values)

Query1:

SELECT index_name,
  UNIQUENESS,
  clustering_factor,
  num_rows,
  CEIL((clustering_factor/num_rows)*100) AS cluster_pct
FROM all_indexes
WHERE table_name='MYTABLE';

Result:

INDEX_NAME           UNIQUENES CLUSTERING_FACTOR   NUM_ROWS CLUSTER_PCT
-------------------- --------- ----------------- ---------- -----------
PK_TEST              UNIQUE             10009871   10453407          96 --> So High
UITEST01             UNIQUE               853733   10113211           9 --> Very Less

We can see the PK having the highest CF and the other unique index is not.

The only logical explanation that strikes me is, the data beneath is stored actually by order of column over the Unique index.

1) Am I right with this understanding?
2) Is there any way to give the PK , the lowest CF number?
3) Seeing the Query cost using both these index, it is very fast for single selects. But still, the CF number is what baffle us.

The table is relatively huge over 10M records, and also receives real time inserts/updates.

My Database version is Oracle 11gR2, over Exadata X2

codenheim · Accepted Answer · 2014-10-01T20:17:18.330

You are seeing the evidence of a heap table indexed by an ordered tree structure.

To get extremely low CF numbers you'd need to order the data as per the index. If you want to do this (like SQL Server or Sybase clustered indexes), in Oracle you have a couple of options:

Simply create supplemental indexes with additional columns that can satisfy your common queries. Oracle can return a result set from an index without referring to the base table if all of the required columns are in the index. If possible, consider adding columns to the trailing end of your PK to serve your heaviest query (practical if your query has small number of columns). This is usually advisable over changing all of your tables to IOTs.
Use an IOT (Index Organized Table) - It is a table, stored as an index, so is ordered by the primary key.
Sorted hash cluster - More complicated, but can also yield gains when accessing a list of records for a certain key (like a bunch of text messages for a given phone number)
Reorganize your data and store the records in the table in order of your index. This option is ok if your data isn't changing, and you just want to reorder the heap, though you can't explicitly control the order; all you can do is order the query and let Oracle append it to a new segment.

If most of your access patterns are random (OLTP), single record accesses, then I wouldn't worry about the clustering factor alone. That is just a metric that is neither bad nor good, it just depends on the context, and what you are trying to accomplish.

Always remember, Oracle's issues are not SQL Server's issues, so make sure any design change is justified by performance measurement. Oracle is highly concurrent, and very low on contention. Its multi-version concurrency design is very efficient and differs from other databases. That said, it is still a good tuning practice to order data for sequential access if that is your common use case.

To read some better advice on this subject, read Ask Tom: what are oracle's clustered and nonclustered indexes

+1 Nice write up. And yes , I see for OLTP this least matters, with multiple tests. The database retrieves it very fast. Then IOT, I cannot opt for, as tht might affect the performance of my Inserts, and also, there are multiple secondary indexes over it. My concern was if a high Clustering factor end up with more I/O processing under hood. It appears not to create any performance mess. Atleast with my test results. — Maheswaran Ravisankar, Oct 02 '14 at 06:35
Correct, because if 99% of your queries retrieve 1 key, the clustering factor doesn't come into play. It is an aspect of measuring 2 adjacent keys. As far as impact to insert performance, again, you normally insert once and ready many, many times, so insert performance is mostly of concern for non-OLTP purposes. — codenheim, Oct 02 '14 at 06:44
Unfortunately for my case, there will be atleast 10K inserts and 5K deletes on a daily basis.(Like comments created and purged). Else Hash clusters would have been in place — Maheswaran Ravisankar, Oct 02 '14 at 06:48
Sure. Well 10K per day isn't that much. That is less than 500 per hour, pocket change for an Exadata. If you want more performance, though, you could go for full SSD storage. Are they offering SSD as an option on Exadata nowadays? It's one Oracle toy I've never had the chance to touch. — codenheim, Oct 02 '14 at 06:55

Clustering Factor and Unique Key

1 Answers1