I had a table Blah ( latitude float, longitude float, create_time date, owner_id int , ..... )
and my code does only a single query
select *
from Blah
where latitude < l1 and latitude > l2
and longitude < ll1 and longitude > ll2
and create_time < t1 and create_time > t2
and owner_id < o1 and owner_id > o2 ;
(of course the values l1, l2,.... o1,o2 are dynamic params coming from program)
my question is what kind of index I should create; composite index? in case of composite index, which column should I put first? how effective is the index?
I thought about this for a long time, and couldn't find detailed docs on how the oracle index works.
I can find docs that it's implemented using B-tree, in our case: each key in the B-tree is a 4-tuple: ( column1, column2, column3, column4) where the ordering relation of such tuples are defined to be lexical order.
then for the above query, assuming our order is (owner_id, create_time, latitude, longitude), I guess oracle would first need to binary search to the point ( o1, t1, l1,ll1), for this operation, the index is indeed useful. but next, we need to find the ending point of this first interium: we need to find (o1,t1, l1, ll2 ), this can be done by binary search too.
next, we need to find the next section that satisfies the condition, so we need to find (o1, t1, lx, ll1 ) where lx is the next value larger than l1, we could find this by binary search too. but in our case, it's very likely that for the same latitude, there can be no more than 1 longitude, so here binary search is not more effective than linear scan.
following this spirit, it seems that we should put the column with a small value range cardinality first, in this case, create_time, if our points are created on only a few days. also if we never do range conditions, but only equals (=) conditions, then it does not matter which column is first, right?
to make it clearer, here is a simpler example:
let's say I have 2 columns, X, and Y
in the db, values for both are [1,2,....100], so we have 100x100 rows
my query is
select * from mytable where X > 34 and X < 78 and Y > 12 and Y < 15;
say our index is on (X, Y), so the comparison rule between 2 values are
v1 < v2 <=====> v1.x < v2.x || v1.x == v2.x && v1.y < v2.y
given the above ordering rule, we can see that the values in the index are arranged in serial like (values for x,y):
1,1, 1,2 1,3 .... 1,100
2,1 2,2 2,3 ......2,100
.....
100,1 100,2 ....... 100,100
now , to search for the values in the query, the B-Tree traversal needs to locate (78-34-1) intervals, hence (78-34-1)*2 lookup (1 for the beginning one for the end locations), not just 2 lookups.
so if we have higher dimensions, the interval counts increases exponentially with the number of dimensions, so indexing may not be useful anymore ------ this is my concern
thanks a lot Yang