2

I am trying to "join" two lists together. Both lists are of the same type and each instance of that type has a unique key. If there are instances with the same key in both lists, then I would like to merge the two instances together with a custom merge function. The final list of items should contain the merged elements plus the instances that were only in one of the two lists in the first place.

This is similar to a Union and a Join, but seems to be subtly different to each of them. A union would give me the right list of keys, but has no facility to merge instances where they share the same key - it would just return one of the instances and ignore the other. A join allows me to merge the repeated instances by supplying a function but it would only return the elements that were in both lists - not one or the other.

Have I missed a good built-in way to do this?

Stephen Hewlett
  • 2,415
  • 1
  • 18
  • 31

2 Answers2

4

This should be quite easy.

If I can assume that you have a merge function like this:

Func<T, T, T> merge = (a, b) => /* your result here */;

Then this should work:

var intersects = listA.Join(listB, x => x.Id, x => x.Id, (a, b) => merge(a, b));
var excepts = listA.Except(listB).Concat(listB.Except(listA));

var results = intersects.Concat(excepts);

Let me know if this works.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • 1
    That certainly looks like it will work, but it enumerates the lists several times. I'm wondering if there is a function that can do it all in one go, more efficiently. So far it looks like there isn't so I expect I will go with this approach. – Stephen Hewlett Aug 27 '12 at 02:44
  • @StephenHewlett - How big are your lists that enumerating them several times is going to make a difference? I just tested with two lists of 1,000,000 each and it took 1.863 seconds. With just 100,000 items each it took 97ms. – Enigmativity Aug 27 '12 at 04:19
2

Assuming there are no dupe ids in a list, what you need is an outer join. Here's an implementation... I make no guarantee of optimal performance:

public static class LinqEx
{
    public static IEnumerable<TResult> 
        LeftOuterJoin<TOuter, TInner, TKey, TResult>(
            this IEnumerable<TOuter> outer, 
            IEnumerable<TInner> inner, 
            Func<TOuter, TKey> outerKeySelector, 
            Func<TInner, TKey> innerKeySelector, 
            Func<TOuter, TInner, TResult> resultSelector)
    {
        return outer
            .GroupJoin(
                inner, 
                outerKeySelector, 
                innerKeySelector, 
                (a, b) => new
                {
                    a,
                    b
                })
            .SelectMany(
                x => x.b.DefaultIfEmpty(), 
                (x, b) => resultSelector(x.a, b));
    }

    public static IEnumerable<TResult> 
        FullOuterJoin<TSet1, TSet2, TKey, TResult>(
            this IEnumerable<TSet1> set1, 
            IEnumerable<TSet2> set2, 
            Func<TSet1, TKey> set1Selector, 
            Func<TSet2, TKey> set2Selector, 
            Func<TSet1, TSet2, TResult> resultSelector)
    {
        var leftJoin = set1.
            LeftOuterJoin(
                set2, 
                set1Selector, 
                set2Selector, 
                (s1, s2) => new {s1, s2});
        var rightJoin = set2
            .LeftOuterJoin(
                set1, 
                set2Selector, 
                set1Selector, 
                (s2, s1) => new {s1, s2});
        return leftJoin.Union(rightJoin)
            .Select(x => resultSelector(x.s1, x.s2));

    }
}

so:

list1.FullOuterJoin(
    list2, 
    list1Item => list1Item.Id,
    list2Item => list2Item.Id,
    (list1Item, list2Item) => {
      if(listItem1!=null && listItem2!=null)
      {
        return merge(listItem1, listItem2);
      }
      return listItem1 ?? listItem2;
    })
spender
  • 117,338
  • 33
  • 229
  • 351
  • Can you give a real-life example of what you have under the "so:"? – shubniggurath Oct 20 '14 at 19:01
  • I found that this didn't work because the FullOuterJoin unions "set1 joined to set2" and "set2 joined to set1", all of the items that are joined are duplicated. – Lee Oades May 08 '20 at 12:38