0

Going of the diagram here: I'm confused on column 1 and 3.

I am working on an datawarehouse table and there are two columns that are used as a key that gets you the primary key.

The first column is the source system. there are three possible values Lets say IBM, SQL, ORACLE. Then the second part of the composite key is the transaction ID it could ne numerical or varchar. There is no 3rd column. Other than the secret key which would be a key generated by Identity(1,1) as the record gets loaded. So in the graph below I imagine if I pass in a query

Select a.Patient,
       b.Source System,
       b.TransactionID
from Patient A
right join Transactions B
    on A.sourceSystem = B.sourceSystem and
       a.transactionID = B.transactionID
where SourceSystem = "SQL" 

The graph leads me to think that column 1 in the index should be set to the SourceSystem. Since it would immediately split the drill down into the next level of index by a 3rd. But when showing this graph to a coworker, they interpreted it as column 1 would be the transactionID, and column 2 as the source system.

Cols

  1   2   3
-------------
|   | 1 |   |
| A |---|   |
|   | 2 |   |
|---|---|   |
|   |   |   |
|   | 1 | 9 |
| B |   |   |
|   |---|   |
|   | 2 |   |
|   |---|   |
|   | 3 |   |
|---|---|   |
Community
  • 1
  • 1
Phillip Putzback
  • 151
  • 3
  • 13
  • Are you asking about your database design, or about performance? – Tim Biegeleisen Nov 04 '16 at 01:07
  • Your query will not execute. It will give an ambiguous column name error. – Gordon Linoff Nov 04 '16 at 01:15
  • The question is really: should I put the most selective column first or last (google that!). A lot of it boils down to: Do _other_ queries use Col1 or Col2 the most? In other words how reusable is the index for other queries. – Nick.Mc Nov 04 '16 at 01:55

1 Answers1

0

First, you should qualify all column names in a query. Second, left join usually makes more sense than a right join (the semantics are keep all columns in the first table). Finally, if you have proper foreign key relationships, then you probably don't need an outer join at all.

Let's consider this query:

Select p.Patient, t.Source System, t.TransactionID
from Patient p join
     Transactions t
     on t.sourceSystem = p.sourceSystem and
        t.transactionID = p.transactionID
where t.SourceSystem = 'SQL';

The correct index for this query is Transactions(SourceSystem, TransactionId).

Notes:

  • Outer joins affect the choice of indexes. Basically if one of the tables has to be scanned anyway, then an index might be less useful.
  • t.SourceSystem = 'SQL' and p.SourceSystem = 'SQL' would probably optimize differently.
  • Does the patient really have a transaction id? That seems strange.
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • The query was just a rough example to show the cardinality. With Source system having 3 unique values tied to a high cardinality column with millions of unique transaction ids. So based on what Gordon says, Source system is the most selective column and should be the first column in the index. – Phillip Putzback Nov 04 '16 at 02:45