9

I had a table Blah ( latitude float, longitude float, create_time date, owner_id int , ..... )

and my code does only a single query

select * 
from Blah 
where latitude < l1 and latitude > l2   
and longitude < ll1 and longitude > ll2   
and create_time < t1 and create_time > t2 
and owner_id < o1 and owner_id > o2 ;

(of course the values l1, l2,.... o1,o2 are dynamic params coming from program)

my question is what kind of index I should create; composite index? in case of composite index, which column should I put first? how effective is the index?

I thought about this for a long time, and couldn't find detailed docs on how the oracle index works.

I can find docs that it's implemented using B-tree, in our case: each key in the B-tree is a 4-tuple: ( column1, column2, column3, column4) where the ordering relation of such tuples are defined to be lexical order.

then for the above query, assuming our order is (owner_id, create_time, latitude, longitude), I guess oracle would first need to binary search to the point ( o1, t1, l1,ll1), for this operation, the index is indeed useful. but next, we need to find the ending point of this first interium: we need to find (o1,t1, l1, ll2 ), this can be done by binary search too.

next, we need to find the next section that satisfies the condition, so we need to find (o1, t1, lx, ll1 ) where lx is the next value larger than l1, we could find this by binary search too. but in our case, it's very likely that for the same latitude, there can be no more than 1 longitude, so here binary search is not more effective than linear scan.

following this spirit, it seems that we should put the column with a small value range cardinality first, in this case, create_time, if our points are created on only a few days. also if we never do range conditions, but only equals (=) conditions, then it does not matter which column is first, right?

to make it clearer, here is a simpler example:

let's say I have 2 columns, X, and Y

in the db, values for both are [1,2,....100], so we have 100x100 rows

my query is

select * from mytable where X > 34 and X < 78 and Y > 12 and Y < 15;

say our index is on (X, Y), so the comparison rule between 2 values are

v1 < v2 <=====>  v1.x < v2.x || v1.x == v2.x && v1.y < v2.y

given the above ordering rule, we can see that the values in the index are arranged in serial like (values for x,y):

1,1, 1,2 1,3 .... 1,100     
2,1  2,2 2,3 ......2,100
.....
100,1 100,2 ....... 100,100

now , to search for the values in the query, the B-Tree traversal needs to locate (78-34-1) intervals, hence (78-34-1)*2 lookup (1 for the beginning one for the end locations), not just 2 lookups.

so if we have higher dimensions, the interval counts increases exponentially with the number of dimensions, so indexing may not be useful anymore ------ this is my concern

thanks a lot Yang

James A Mohler
  • 11,060
  • 15
  • 46
  • 72
teddy teddy
  • 3,025
  • 6
  • 31
  • 48

5 Answers5

9

If your only goal is to create an index to optimize this query, you'd prefer that the columns in the composite index be ordered with the most selective column first. If the predicates on latitude eliminate substantially more rows than the other predicates, it will be more efficient to have that column first. If the predicates on owner_id eliminate substantially more rows than the other predicates, it will be more efficient to have that column first.

In reality, though, we are rarely creating indexes whose only purpose is to optimize a single query. Generally, in order to make the overhead of index maintenance worthwhile, we want our indexes to be useful across many queries. In the case of a composite index, that means ordering the columns by the probability that a query will have predicates on that column. If you have a composite index on owner_id, create_time, latitude, longitude, for example, you can use that for queries that just specify predicates on owner_id. But you would not, realistically, use that index for queries that just specify predicates on longitude.

Justin Cave
  • 227,342
  • 24
  • 367
  • 384
3

First, bear in mind that the "B" in "B-Tree" is not "binary".

Second, when it comes to indexing in Oracle you also have the choice of a bitmap index if:

  1. You have an enterprise edition license
  2. You do not have many sessions concurrently modifying the table
  3. Your indexed values are not close to being unique (statements that bitmap indexes are usable only for low cardinality columns are generally exaggerated)

One type of query that bitmap indexes excel at is in efficiently combining predicates on multiple columns, especially where the set of predicated columns varies (which may not be the case for you of course). If you meet the three conditions above then it would be worth testinig the effect of having four separate bitmap indexes on the table.

David Aldridge
  • 51,479
  • 8
  • 68
  • 96
  • thanks, but right now I'm only concerned with the problem of B-tree index; though in practice Oracle may well use bitmap index, as you pointed out – teddy teddy May 18 '12 at 18:08
1

One easy brute-force solution is to create multiple index combinations on the same table, run the query with EXPLAIN PLAN turned on then choose the index that your DBMS prefers to use.

pd40
  • 3,187
  • 3
  • 20
  • 29
0

is this table used for OLTP or as a DWH? if you don't have many single row/Multithreaded DML statments on this table you can use bitmap indexes. bitmap indexes allows you ROWID AND operators between multiple indexs (Aka star transformation). in order to do it create a bitmap index on each column. Like I've sayied this solution fits best for DWH system where you have a single batch insert.

asafm
  • 911
  • 6
  • 17
0

Multidimensional range queries are best handled, IMHO, outside of the standard B-tree indexes. A few papers on the general topic can be found by a web search on "multidimensional range queries".

Oracle provides a product called Oracle Spatial. The documentation for this product includes, in Chapter 4, examples and explanations of creating spatial indexes and performing queries. There's no new SQL syntax; their example for index creation is:

CREATE INDEX territory_idx ON territories (territory_geom)
    INDEXTYPE IS MDSYS.SPATIAL_INDEX;

which creates an R-tree index.

I think the existence of R-trees, kdb-trees and similar spatial structures is evidence for the fact that standard B-trees are probably not well-suited to these kinds of applications.

Ray Toal
  • 86,166
  • 18
  • 182
  • 232