-1

I need to selectively retrieve data from two tables that have a 1 to many relationship. A simplified example follows.

Table A is a list of events:

Id  |  TimeStamp  |  EventTypeId
--------------------------------
1   |  10:26...   |  12
2   |  11:31...   |  13
3   |  14:56...   |  12

Table B is a list of properties for the events. Different event types have different numbers of properties. Some event types have no properties at all:

EventId | Property | Value
------------------------------
1       | 1        | dog
1       | 2        | cat
3       | 1        | mazda
3       | 2        | honda
3       | 3        | toyota

There are a number of conditions that I will apply when I retrieve the data, however they all revolve around table A. For instance, I may want only events on a certain day, or only events of a certain type.

I believe I have two options for retrieving the data:

Option 1
Perform two queries: first query table A (with a WHERE clause) and store data somewhere, then query table B (joining on table A in order to use same WHERE clause) and "fill in the blanks" in the data that I retrieved from table A.

This option requires SQL Server to perform 2 searches through table A, however the resulting 2 data sets contain no duplicate data.

Option 2
Perform a single query, joining table A to table B with a LEFT JOIN.

This option only requires one search of table A but the resulting data set will contain many duplicated values.

Conclusion
Is there a "correct" way to do this or do I need to try both ways and see which one is quicker?

James
  • 2,404
  • 2
  • 28
  • 33

1 Answers1

0

Ex

Select E.Id,E.Name from Employee E join Dept D on E.DeptId=D.Id 

and a subquery something like this -

Select E.Id,E.Name from Employee Where DeptId in (Select Id from Dept) 

When I consider performance which of the two queries would be faster and why ?

would EXPECT the first query to be quicker, mainly because you have an equivalence and an explicit JOIN. In my experience IN is a very slow operator, since SQL normally evaluates it as a series of WHERE clauses separated by "OR" (WHERE x=Y OR x=Z OR...).

As with ALL THINGS SQL though, your mileage may vary. The speed will depend a lot on indexes (do you have indexes on both ID columns? That will help a lot...) among other things.

The only REAL way to tell with 100% certainty which is faster is to turn on performance tracking (IO Statistics is especially useful) and run them both. Make sure to clear your cache between runs!

More REF

Community
  • 1
  • 1
Sanjay Goswami
  • 1,386
  • 6
  • 13