1

I'm trying to combine two lookups into one as the example below

enter image description here

Currently, I'm getting the first table into a list using linq then looping on the second table and adding new row if does not exist or updating Table2Value if ID exists, but its performance is too slow due two huge amount of data used.

Any way to achieve this using only linq?

Current Code:

        IQueryable<KeyValuePair<string, KeyValuePair<int, string>>> englishResources = _localizationService.GetAllResourceValues(1).AsQueryable();
        IQueryable<KeyValuePair<string, KeyValuePair<int, string>>> arabicResources = _localizationService.GetAllResourceValues(2).AsQueryable();

        List<LanguageResourceModel> languagesResources = englishResources.Select(c => new LanguageResourceModel()
        {
            Name = c.Key,
            EnglishValue = c.Value.Value,
        }).ToList();

        foreach (var item in arabicResources)
        {
            if (languagesResources.Any(c => c.Name.ToLower() == item.Key.ToLower()))
            {
                languagesResources.Where(c => c.Name.ToLower() == item.Key.ToLower()).FirstOrDefault().ArabicValue = item.Value.Value;
            }
            else
            {
                languagesResources.Add(new LanguageResourceModel
                {
                    Name = item.Key,
                    ArabicValue = item.Value.Value,
                });
            }
        }
Mohamed Usama
  • 99
  • 2
  • 13

2 Answers2

1

First of all, I assume you're using Nop.Services.Localization? If so, calling ToLower() when matching keys is overkill (because the keys are already in lowercase) and might degrade the performance as we then can't leverage the efficient nature of Dictionaries.

Secondly,what you're attempting to do is a Full Outer Join. Though you can do it in LINQ, it's far from efficient for a large dataset.

Since you'll be doing a lot of key matching, you don't need to look further than a simple Dictionary. It's optimised for key lookups.

You can play around with the full test code in Fiddle

// No need for Queryables. The function already returns Dictionary which is already an IEnumerable.
var englishResources = GetAllResourceValues(1);
var arabicResources = GetAllResourceValues(2);

// Start with all the english words.
// Use dictionary because it has efficient key matching.
var merged = englishResources.ToDictionary( 
    i => i.Key, 
    i => new LanguageResourceModel { 
            Name = i.Key, 
            EnglishValue = i.Value.Value 
        });

// Now merge the arabic ones. 
// You could LINQ-ify the whole thing, but it's not gonna make it more efficient.
LanguageResourceModel found;
foreach (var item in arabicResources)
{
    // If value already exists, update it
    if (merged.TryGetValue(item.Key, out found))
        found.ArabicValue = item.Value.Value;
    else // Otherwise, add a new one
        merged[item.Key] = new LanguageResourceModel { 
            Name = item.Key, 
            ArabicValue = item.Value.Value 
        };
}
NPras
  • 3,135
  • 15
  • 29
1

What you want is a faster function for a full outer join, preferably in a LINQ statement.

I'll first write a solution specific for your problem. After that I'll write a generic solution that can be used for all collections where you want a full outer join.

If you are not familiar with (inner) join, group join, left outer join, full outer join, etc see: A visual explanation about joins

The reason why your join is so slow is because for every element of collection A, you check every element of collection B to find matching Ids. This is a waste of time.

You need a fast lookup: given an Id, which element has this Id? This is typically a case to use a dictionary.

The function that speeds up your full outer join, I assume your table1 is a sequence of T1 objects, table2 is a sequence of T2 objects, and your result table is a sequence of TResult objects.

IEnumerable<TResult> FullOuterJoin(IEnumerable<T1> table1, IEnumerable<T2> table2)
{
    // put table1 elements in a dictionary with Id as key
    // do the same for table2 elements
    Dictionary<int, T1> lookup1 = table1.ToDictionary(t1 => t1.Id);
    Dictionary<int, T2> lookup2 = table2.ToDictionary(t2 => t2.Id);

    // create a sequence of all Ids used in table1 and/or table2
    // remove duplicates using Distinct
    IEnumerable<int> allIdsInT1 = table1.Select(t1 => t1.Id);
    IEnumerable<int> allIdsInT2 = table2.Select(t2 => t2.Id);
    IEnumerable<int> allUsedIds = allIdsInT1
        .Concat(allIdsInT2)
        .Distinct();

    // now enumerate over all elements in allUsedIds.
    // find the matching element in lookup1
    // find the matching element in lookup2
    // if no match found: use null
    foreach (int id in allUsedIds)
    {
        // find the element with Id in lookup1; use null if there is no such element
        T1 found1;
        bool t1Found = lookup1.TryGetValue(id, out found1);
        if (!t1Found) found1 = null;

        // find the element with Id in lookup2; use null if there is no such element
        T2 found2;
        bool t2Found = lookup2.TryGetValue(id, out found2
        if (!t2Found) found2 = null;

        TResult result = new TResult()
        {
            Id = id,
            Table1Value = found1,
            Table2Value = found2,
        };
        yield return result;
    }

This will solve your efficiency problem.

Note: I was able to use a Dictionary, because I assumed that your Ids are unique. If not, use a Lookup table.. I'll do that in the LINQ example below

If you'll need functionality like this quite often, consider creating an extension function for Enumerable that does the same for every two collections and for every kind of comparison key and for every type of equality.

For extension methods see: Extension Methods Demystified

Create a function, that takes two sequences. You specify which property of sequence A is used and which property of sequence B is used to find the common value on which we'll join. You also specify an equality comparer for the keys. If not, the default comparer is used.

After your full outer join, you've got a key, a sequence of elements from A that matches this key, and a sequence of elements from B that matches this key. You specify what to do with those three to create a result.

The extension function:

public static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
   IEnumerable<TA> sourceA,       // The first collection
   IEnumerable<TB> sourceB,       // The second collection
   Func<TA, TKey> keySelectorA,   //which property from A is the key to join on?
   Func<TB, TKey> keySelectorB,   //which property from B is the key to join on?
   Funct<TA, TB, TKey, TResult> resultSelector,
   // says what to return with the matching elements and the key

   TA defaultA = default(TA),     // use this value if no matching A is found
   TA defaultB = default(TB),     // use this value if no matching B is found
   IEqualityComparer<TKey> cmp = null)
   // the equality comparer used to check if key A equals key B)
  {
      // TODO implement
  }

The last three parameters have defaults. If you don't specify them, commonly used defaults will be used.

In your case you would use it as follows:

IEnumerable<T1> table1 = ...
IEnumerable<T2> table2 = ...

// Full Outer Join table1 and table2
IEnumerable<MyResult> ResultTable = table1.FullOuterJoin(table2,
   t1 => t1.Id,            // from every element of table1 take the Id
   t2 => t2.Id,            // from every element of table2 also take the Id

   // if you have a match t1, t2, (or default) with key, create a MyResult:
   (t1, t2, key, MyResult) => new MyResult()
   {
       Id = key,
       Table1Value = t1,
       Table2Value = t2
   });

As I use defaults for the last three parameters, not found elements will be null, and a default integer comparer is used.

Implementation is similar as the one above. As I might have several same values that match key, I'll use a lookup table.

// If no EqualityComparer is provided, use the default one:
cmp = cmp ?? EqualityComparer<TKey>.Default;

// create the two lookup tables:
ILookup<TKey, TA> alookup = sourceA.ToLookup(keySelectorA, cmp);
ILookup<TKey, TB> blookup = sourceB.ToLookup(keySelectorB, cmp);

// get a collection of all keys used in sourceA and/or sourceB.
// Remove duplicates using the equalityComparer
IEnumerable<TKey> allKeysUsedInA = sourceA.Select(a => keySelectorA(a));
IEnumerable<TKey> allKeysUsedInB = sourceB.Select(b => keySelectorB(b));
IEnumerable<TKey> allUsedKeys = allKeysUsedInA
    .Concat(allKeysUsedInB)
    .Distinct(cmp);

// now enumerate over all keys, get the matching elements from sourceA / sourceB 
// use defaults if not available
foreach (TKey key in allUsedKeys)
{
    // get all A elements with TKey, use the default value if the key is not found
    IEnumerable<TA> foundAs = alookup[key].DefaultIfEmpty(defaultA);
    foreach (TA foundA in foundAs)
    {
        // get all B elements with TKey, use the default value if the key is not found
        IEnumerable<TB> foundBs = blookup[key].DefaultIfEmpty(defaultB);
        foreach (TB foundB in foundBs)
        {
            TResult result = resultSelector(foundA, foundB, key);
            yield return result;
        }
    }
}

Usage, you have a sequence of Cities, each with an ID and a Name; I also have a sequence of Streets, each with an ID and a Name. Every street belongs to one City with a foreign key CityId. Obviously a Street also has a Name

From every City I want the city name and the names of the Streets in the city.

The full outer join would be:

var result = Cities.FullOuterJoin(Streets,
    city => city.Id,                    // from every city take the Id
    street => street.CityId             // from every street take the CityId
    (city, street, matchingId) => new   // when they match create a new object
    {                                   // using the matching city, street and  matching Id
        CityName = city.Name,           // with the name of the city
        StreetName = street.Name,       // and the name of the street
    }                                   // we don't use the matching id
Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116