5

I've been doing SQL for several years, and throughout all that time, my thought of joins was that of equivalence joins, as in

select ... from t1 join t2 on t1.a = t2.b

Notice how the join is based on one or more equalities, here t1.a = t2.b. However, recently, I don't remember where, I saw a non-equivalence join (I just made this term up, please tell me if there's a real name for it), where the join condition contains at least one non-equality, as in

select ... from t1 join t2 on t1.a > t2.b

Which can be done to do some nice things, especially with outer joins. Let me illustrate this with an example.

Let's consider a table called products, with the following data:

product   year  price
----------------------
apple   2009    4
apple   2008    2
apple   2007    5
apple   2006    6
apple   2005    2
banana  2009    9
banana  2008    12
banana  2007    16
banana  2006    15
banana  2005    10

And we want to do the usual "give me the most expensive year for each product", which is, as far as I know, commonly done with an inner join to the same table grouped by products, like so:

select t1.`name`, t1.`year`, t1.`price`
from products as t1 join
( select `name`, max(`price`) as `max_price` from products group by `name` ) as t2 on t1.`name`=t2.`name` and t1.`price`=t2.`max_price`

So, on t2, we're getting the maximum price for each product, and then we join this result with the same table, to get the rest of the data for that column (it gets a little tricky for tiebreaking)

However, with a non-equivalence outer join, we can do it like so:

select t1.`name`, t1.`year`, t1.`price`
from products as t1 left join products as t2 on t1.`name`=t2.`name` and t1.`price` < t2.`price`
where t2.`name` is null

This time, we're joining the same table twice, where the price on t1 is lesser than the price on t2. The trick here is that, since this is a left outer join, t2 values on the resulting join will be null when the join doesn't match, which happens for the maximum value for the price.

Both these queries yield the same result, but I'm not really sure which one performs better. The first query has an expensive grouping, while the second one must manually check all t1/t2 pairs to get a result. Tiebreaking seems to be easier with the non-equivalence join though.

So, my question is:

Are there any recommended sources (books, webpages) that discuss non-equivalence joins more in depth, explaining what you can do with them (I'm assuming you can do lots more than just getting maximum values in groups), and how they perform against other methods for doing the same things?

Edit: I know windowing functions are also available for doing things such as the trivial example I mentioned above. I'm not asking how to get the maximum value of a table. I know how to do this, and I even provided two ways to do it. I want to know what else can I do with non-equivalence joins.

  • 2
    "θ-join" (theta join): http://en.wikipedia.org/wiki/Relational_algebra – ypercubeᵀᴹ Mar 24 '11 at 06:37
  • Usually the best way to write these queries is to use analytic/window functions. – Jon Heller Mar 24 '11 at 06:48
  • Not sure: on one you can use a b-tree index on the join, the other you can not. If the tables are small enough or you need to visit every row it will not matter. – nate c Mar 24 '11 at 19:10
  • @nate c, would you care to explain some more? –  Mar 25 '11 at 01:45
  • 1
    @Oscar: Think about this `t1.a > t2.b` (which will be indexed). If there are thousands of possible values in between the values, then using the index is useless because there is not enough selectivity. If you took max(t1.a) then you can use an index scan of the top value. In theory, both should generate the same underlying query plan, since SQL is logical not procedural, but I would be surprised if they did. http://explainextended.com has tons of articles on below API level stuff like this. – nate c Mar 25 '11 at 19:24

2 Answers2

2

And we want to do the usual "give me the most expensive year for each product", which is, as far as I know, commonly done with an inner join to the same table grouped by products, like so

A lot easier done using a windowing function (as jonealres has already mentioned)

select name, 
       year, 
       price,
       max(price) over (partition by name) as most_expensive
from products
  • Yes, unfortunately MySQL doesn't have support for windowing functions, which means my sql wouldn't be portable. My point with my question is not "how to do the most expensive product?" but "what else can I do with non-equivalence joins?" –  Mar 24 '11 at 07:13
  • 4
    @Oscar Rodriguez: Well, 3 out of 4 of the DBMS you listed support it. "Portable SQL" is a myth and simply means it will run equally slow on every DBMS... –  Mar 24 '11 at 07:15
  • @Oscar Rodriguez: It is up to you on what you can do, as I said earlier it depends on imagination. This kind of join has the same performance as the equality one if it is user right. – Dumitrescu Bogdan Mar 24 '11 at 07:39
  • Yes, but MySQL doesn't. My question is more on the theoretical side of SQL. I found a tool called "non-equivalence join", and I want to know what else can I do with it. That's all there is to my question. –  Mar 24 '11 at 07:40
0

I did similar things here SQL Query Creating Start and End Dates and here SQL Server logical grouping most recent time.

This kind of joins are not different then the "equality" ones. The on clause is just a logical evaluation. In case indexes exist it is evaluated closely to a where clause on indexes.

Other then that it depends on everyones imagination. SQL is rich, so it can be combined.

Community
  • 1
  • 1
Dumitrescu Bogdan
  • 7,127
  • 2
  • 23
  • 31
  • Of course, the SQL engine doesn't really care about what's inside the join expression as long as it can be evaluated to a boolean value. But I'd have to say that doing this with an outer join, and then checking for nullity on the result is quite ingenious, and I have to agree that I wouldn't have come up with it in the first place. Your examples are quite good. I wonder if you know where I can find more examples on this kind of joins. –  Mar 24 '11 at 08:16
  • I have no idea where you can find examples. What I showed you were just some of my resolves of some problems asked by other users. Actually I never put myself this question that you put, I just wrote the sql in that manner and it worked. – Dumitrescu Bogdan Mar 24 '11 at 08:27