Querying for nullable relationship set of two lists

Question

If I have these lists as inputs:

List<Record> A = new List<Record> { new Record { Value = 1 }, new Record { Value = 2 }, new Record { Value = 3 }, new Record { Value = 4 }, new Record { Value = 5 } };
List<Record> B = new List<Record> { new Record { Value = 1 }, new Record { Value = 2 }, new Record { Value = 5 }, new Record { Value = 7 } };

So from above, A has these elements:

Record{Value = 1}
Record{Value = 2}
Record{Value = 3}
Record{Value = 4}
Record{Value = 5}

And B has these elements:

Record{Value = 1}
Record{Value = 2}
Record{Value = 5}
Record{Value = 7}

I would like the following output. A List<Tuple<Record, Record>> where, in our case, the list looks like:

Record{Value = 1} | Record{Value = 1}
Record{Value = 2} | Record{Value = 2}
Record{Value = 3} | null
Record{Value = 4} | null
Record{Value = 5} | Record{Value = 5}
null              | Record{Value = 7}

The first column above, which represents the first element in our tuple, can only contain objects from A, or null. The second column above, which represents the second element in our tuple, can only contain objects from B, or null. Records have more than just a Value field, thus don't count them as value types otherwise I would have used integers in my question.

Basically, as you can see in the example, objects in A go into the first column, objects in B go into the second column of the output data type. If there are matches on the Value field, then they go side-by-side. If there are no matches on the value field between A and B, then null goes on the opposite side, wherever there is no match.

Is there any way to LINQ query data sets A and B to produce a resulting data type like this? I am not so strict on using a list of tuples. I would be okay with a dictionary, or some other type...

If I understand this correctly, it's rather like a `full outer join` in t-sql. I've written a custom linq operator to do this in the past; it's a bit of work but not too much. One question: can a single object on one side potentially match more than one object on the other side? Your example doesn't show this. — phoog, Aug 20 '14 at 17:17
@Bart this is a full outer join, not a left join. Note the last element in the example results. — phoog, Aug 20 '14 at 17:21

score 2 · Accepted Answer · answered Aug 20 '14 at 17:18

2

You can first create a list of all of the values from both of the lists by concatting them together and getting the distinct values, and then you can join that set with each of the two sets.

var values = A.Concat(B)
    .Select(record => record.Value)
    .Distinct();

var query = from value in values
    join a in A
    on value equals a.Value
    into aMatches
    join b in B
    on value equals b.Value
    into bMatches
    select Tuple.Create(aMatches.FirstOrDefault(), 
        bMatches.FirstOrDefault());

answered Aug 20 '14 at 17:18

Servy

202,030
26
332
449

+1 nice. The custom full-outer-join operator I suggested in a comment would avoid iterating both collections twice, but in most cases the performance hit for iterating twice would be negligible. The cost/benefit analysis would therefore favor this approach in those cases. – phoog Aug 20 '14 at 17:20
@phoog Yeah, he specifically mentions that these are in memory collections, so iterating them twice isn't really a problem at all. It's only a problem if the `IEnumerable` objects represent expensive deferred operations or cause side effects. – Servy Aug 20 '14 at 17:22
or if the in-memory collection is huge, or if the query is run in a (long-running) loop. – phoog Aug 20 '14 at 17:23

Querying for nullable relationship set of two lists

1 Answers1