0

My first time working with indexes in database and so far I've learn that if you have a multi-column index such as index('col1', 'col2', 'col3'), and if you do a query that uses where col2='col2' and col3='col3', that index would not be use.

I also learn that if a column is very low selectivity column. Indexing is useless. However, from my test, it seems none of the above is true at all. Can someone explain more on this?

I have a table with more than 16 million records. Let's say claimID is the primary key, then there're a historynumber column that only have 3 distinct values (1,2,3), and a last column with storeNumber that has about 1 million distinct values.

I have an index for claimID alone, another index(historynumber, claimID), and other index with index(historynumber, storeNumber), and finally index(storeNumber, historynumber).

My guess was that if I do:

select * from my_table where claimId='123456' and historynumber = 1 

would be much faster than

select * from my_table where historynumber = 1 and claimId = '123456'

However, the 2 have exactly the same performance (instant). So I thought the primary key index can work on any column order. Therefore, I tried the same thing but on historynumber and storeNumber instead. The result is exactly the same. Then I start trying out on columns that has no indexes and of course the result is the same also.

Finally, I do a

select * from my_table where historynumber = 1 

and the query takes so long I had to cancel it.

So my conclusion is that the column order in where clause is completely useless, and so is the column order in the index definition since it seems like the database is smart enough to tell which column is the highest selectivity column.

Could someone give me an example that could prove otherwise?

user3758745
  • 777
  • 1
  • 6
  • 19

1 Answers1

0

Index explanation is a huge topic. Don't worry about the sequence of different attributes in the SQL - it has no effect whether you specify

...where claimId='123456' and historynumber = 1 

or the other way round. Each SQL is checked and optimized by the optimizer. To proove how the data gets accessed you could do a EXPLAIN. Check the documentation for more details.

For your other problem

select * from my_table where historynumber = 1 

with an index of (storeNumber, historynumber). Have you ever tried to lookup the name of a caller (having the telephone number) in a telephone book? Well it is pretty much the same for an index - so the column order when creatin the index matters! There are techniques which could help - i.e. index jump scan - but there is no guarantee.

Check out following sites to learn a little bit more about DB2 indexes:

http://db2commerce.com/2013/09/19/db2-luw-basics-indexes/

http://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys

MichaelTiefenbacher
  • 3,805
  • 2
  • 11
  • 17
  • Hey, thanks for replying. The second query wasn't a problem. I was just making sure that a query on historynumber=1 takes a long time rather than instantly, else my other test would fail. I did a lot of research on it today and all of them stated that the order makes a difference. I can't find the all the links I've read today but here're a couple that stated so: [link1](http://stackoverflow.com/questions/179085/multiple-indexes-vs-multi-column-indexes) [link2](http://stackoverflow.com/questions/5542895/sql-indexes-and-performance-improvement) – user3758745 Mar 02 '16 at 22:22
  • So what's missing? Otherwise my answer was exactly to the point right? – MichaelTiefenbacher Mar 03 '16 at 06:17
  • those links are talking about MS SQL Server. The use of indexes is generally DBMS specific. On DB2, order in the `WHERE` doesn't matter...and on DB2 for i at least it's recommended that the most selective column be first in the index. Also on DB2 for i, the `historynumber = 1` query would probably use a full table scan since query engine would probably determine that more than 30% of the rows will be returned. – Charles Mar 03 '16 at 14:07
  • @Michael No, the point was that the order is not important, but the link I showed stated that it is important. Charles, Are you saying the order of the column in the index declaration matter? Why is it if I use index(historynumber, storenumber) to search for historynumber and storenumber, I get the exact result (~6 seconds) where as if the order does matter, that index would be completely useless and I would have to scan the whole table. – user3758745 Mar 03 '16 at 14:42
  • Please read my answer again - and do not mix the two points. 1. WHERE condition attributes can be in any order. 2. Column in the index order of columns matters. If all columns of an index are used in the WHERE clause it could be used. If only - lt us say - the third column is used and the first two are not part of the WHERE condition the index is most probably not used. Would in this case the first column used in teh WHERE it could be used... – MichaelTiefenbacher Mar 03 '16 at 17:58