How do query expression joins depend on the order of keys?

Question

In the documentation for query expressions, I found:

Note that the order of the keys around the = sign in a join expression is significant.

I can't, however, find any information about how exactly the order is significant, what difference it makes, or what the rationale was for making an equality operator non-symmetric.

Can anyone either explain or point me to some better documentation?

score 1 · Answer 1 · answered Mar 21 '14 at 21:25

1

This is important for joins. For example, if you look at the sample for leftOuterJoin:

query {
    for student in db.Student do
    leftOuterJoin selection in db.CourseSelection on
                   (student.StudentID = selection.StudentID) into result
    for selection in result.DefaultIfEmpty() do
    select (student, selection)
    }

The order determines what happens when "missing" values occur. The key is this line in the docs:

If any group is empty, a group with a single default value is used instead.

With the current order, every StudentID within db.Student will be represented, even if db.CourseSelection doesn't have a matching element. If you reverse the order, the opposite is true - every "course selection" will be represented, with missing students getting the default value. This would mean that, in the above, if you switched the order, any students without a course selection would have no representation in the results, where the current order always shows every student.

answered Mar 21 '14 at 21:25

Reed Copsey

554,122
78
1,158
1,373

1

Isn't what to do with non-matches already determined by doing a "left outer join" rather than a "right outer join"? And if I'm doing an inner join, why should it matter at all? – Johann Hibschman Mar 21 '14 at 21:34
@JohannHibschman With a plain join, it doesn't matter (docs are misleading here). In the other join options, that matters. Note that query expressions don't have `rightOuterJoin`, so the order is critical to achieve that via `leftOuterJoin` or `groupJoin`. I think the docs mention that explicitly as a way to explain. – Reed Copsey Mar 21 '14 at 21:41
3

It seems to matter on some syntactical level. If I do "for a in X join b in Y on (b.id = a.id)", VS can't seem to resolve the "b" in "b.id" into anything, while if I join "on (a.id = b.id)", everything seems to work fine. I don't really like experimental programming by intellisense, though, so I'd like to understand what's going on. – Johann Hibschman Mar 21 '14 at 22:00
Regarding Johann Hibschman's question: has anyone found a solution for this? – Anemoia May 12 '16 at 13:51

score 1 · Answer 2 · answered May 12 '16 at 15:14

The expression on the left of the operator must be derived from the "outer" thing being joined and the expression on the right must be derived from the "inner" thing (as you mention in your comment on Reed's answer). This is because of the LINQ API - the actual method that is invoked to build the query looks like this:

static member Join<'TOuter, 'TInner, 'TKey, 'TResult> : 
    outer:IQueryable<'TOuter> *
    inner:IEnumerable<'TInner> *
    outerKeySelector:Expression<Func<'TOuter, 'TKey>> *
    innerKeySelector:Expression<Func<'TInner, 'TKey>> *
    resultSelector:Expression<Func<'TOuter, 'TInner, 'TResult>> -> IQueryable<'TResult>

So you can't join on arbitrary boolean expressions (which you can do in SQL - something like JOIN ON a.x + b.y - 7 > a.w * b.z is fine in SQL but not in LINQ), you can only join based on an equality condition between explicit projections of the outer and inner tables. In my opinion this is a very unfortunate design decision, but it's been carried forward from LINQ into F#.

How do query expression joins depend on the order of keys?

2 Answers2