-2

Presume these two ordered sequences (the elements are unique):

var outer = new char[] { 'a', 'b', 'c', 'other' };
var inner = new char[] { 'a', 'b', 'c', 'altro' };

The desired result is the Full Outer Join:

{ 'a', 'a' }
{ 'b', 'b' }
{ 'c', 'c' }
{ 'other', '' }
{ '', 'altro' }

The Full Outer Join can be achieved in LINQ in this way (it's an answer on SO): LINQ - Full Outer Join

But this solution doesn't takes in account that the elements from both sequences are unique. How can the algorithm be optimized?

Revious
  • 7,816
  • 31
  • 98
  • 147
  • 2
    More efficiently than what? Also, what is the unique key? – Tim Schmelter Sep 06 '18 at 14:55
  • 1
    Than the normal Linq implementation which doesn't know the keys are unique. If I'm wrong please tell me. – Revious Sep 06 '18 at 14:56
  • 2
    Show your "normal" LINQ implementation please – Tim Schmelter Sep 06 '18 at 14:56
  • Why JOIN? Just get unique list of two lists. – jdweng Sep 06 '18 at 15:07
  • 1
    @Revious; So the fourth column is the key column and it's always unique? What is the desired result? – Tim Schmelter Sep 06 '18 at 15:16
  • @TimSchmelter: I've improved the question. However probably you are right.. but I'm sorry that this community has an extremely rigid format which doesn't allow for constructive questions.. I think a lot of work has to be done still. – Revious Sep 06 '18 at 15:17
  • @TimSchmelter: done. I get one list from a database and it's made of many records. The other is just one, from a "template". – Revious Sep 06 '18 at 15:19
  • 1
    Easy. Just change all `ToLookup()`'s into `ToDictionary()`'s. Also skip creation of the `HashSet` of keys and `UnionWith()` the `Select()`'ed keys directly instead. – Good Night Nerd Pride Sep 06 '18 at 15:25
  • 1
    If the two sequences are ordered by the unique keys, then full outer join is simple merge ordered sets operation - linear time and no additional space. – Ivan Stoev Sep 06 '18 at 15:38
  • 1
    If you are doing a search by keys of a database from two lists, combining the keys first and then doing a search of the database is more efficient then doing two joins. I'm not looking at the simple issue of just combining the keys. – jdweng Sep 06 '18 at 15:40

1 Answers1

1

Based on the referenced answer I tried something that doesn't create Lookup's but Dictionary's instead.

IMHO that's the only thing you can cut that will actually save some time.

Skipping creation of the HashSet (as proposed in my rushed comment) is not an option as this will lead to the duplication of all joined pairs.

public static class Extensions {
    public static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
        this IEnumerable<TA> a,
        IEnumerable<TB> b,
        Func<TA, TKey> selectKeyA,
        Func<TB, TKey> selectKeyB,
        Func<TA, TB, TKey, TResult> projection,
        TA defaultA = default(TA),
        TB defaultB = default(TB),
        IEqualityComparer<TKey> cmp = null) {

        cmp = cmp ?? EqualityComparer<TKey>.Default;
        var adict = a.ToDictionary(selectKeyA, cmp);
        var bdict = b.ToDictionary(selectKeyB, cmp);

        var keys = new HashSet<TKey>(adict.Keys, cmp);
        keys.UnionWith(bdict.Keys);

        var join = from key in keys
                   let xa = adict.GetOrDefault(key, defaultA)
                   let xb = bdict.GetOrDefault(key, defaultB)
                   select projection(xa, xb, key);

        return join;
    }

    public static T GetOrDefault<K, T>(this IDictionary<K, T> d, K k, T def = default(T)) 
        => d.TryGetValue(k, out T value) ? value : def;
}
Good Night Nerd Pride
  • 8,245
  • 4
  • 49
  • 65