What you want is a faster function for a full outer join, preferably in a LINQ statement.
I'll first write a solution specific for your problem. After that I'll write a generic solution that can be used for all collections where you want a full outer join.
If you are not familiar with (inner) join, group join, left outer join, full outer join, etc see: A visual explanation about joins
The reason why your join is so slow is because for every element of collection A, you check every element of collection B to find matching Ids. This is a waste of time.
You need a fast lookup: given an Id, which element has this Id? This is typically a case to use a dictionary.
The function that speeds up your full outer join, I assume your table1 is a sequence of T1 objects, table2 is a sequence of T2 objects, and your result table is a sequence of TResult objects.
IEnumerable<TResult> FullOuterJoin(IEnumerable<T1> table1, IEnumerable<T2> table2)
{
// put table1 elements in a dictionary with Id as key
// do the same for table2 elements
Dictionary<int, T1> lookup1 = table1.ToDictionary(t1 => t1.Id);
Dictionary<int, T2> lookup2 = table2.ToDictionary(t2 => t2.Id);
// create a sequence of all Ids used in table1 and/or table2
// remove duplicates using Distinct
IEnumerable<int> allIdsInT1 = table1.Select(t1 => t1.Id);
IEnumerable<int> allIdsInT2 = table2.Select(t2 => t2.Id);
IEnumerable<int> allUsedIds = allIdsInT1
.Concat(allIdsInT2)
.Distinct();
// now enumerate over all elements in allUsedIds.
// find the matching element in lookup1
// find the matching element in lookup2
// if no match found: use null
foreach (int id in allUsedIds)
{
// find the element with Id in lookup1; use null if there is no such element
T1 found1;
bool t1Found = lookup1.TryGetValue(id, out found1);
if (!t1Found) found1 = null;
// find the element with Id in lookup2; use null if there is no such element
T2 found2;
bool t2Found = lookup2.TryGetValue(id, out found2
if (!t2Found) found2 = null;
TResult result = new TResult()
{
Id = id,
Table1Value = found1,
Table2Value = found2,
};
yield return result;
}
This will solve your efficiency problem.
Note: I was able to use a Dictionary, because I assumed that your Ids are unique. If not, use a Lookup table.. I'll do that in the LINQ example below
If you'll need functionality like this quite often, consider creating an extension function for Enumerable that does the same for every two collections and for every kind of comparison key and for every type of equality.
For extension methods see: Extension Methods Demystified
Create a function, that takes two sequences. You specify which property of sequence A is used and which property of sequence B is used to find the common value on which we'll join. You also specify an equality comparer for the keys. If not, the default comparer is used.
After your full outer join, you've got a key, a sequence of elements from A that matches this key,
and a sequence of elements from B that matches this key. You specify what to do with those three to create a result.
The extension function:
public static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
IEnumerable<TA> sourceA, // The first collection
IEnumerable<TB> sourceB, // The second collection
Func<TA, TKey> keySelectorA, //which property from A is the key to join on?
Func<TB, TKey> keySelectorB, //which property from B is the key to join on?
Funct<TA, TB, TKey, TResult> resultSelector,
// says what to return with the matching elements and the key
TA defaultA = default(TA), // use this value if no matching A is found
TA defaultB = default(TB), // use this value if no matching B is found
IEqualityComparer<TKey> cmp = null)
// the equality comparer used to check if key A equals key B)
{
// TODO implement
}
The last three parameters have defaults. If you don't specify them, commonly used defaults will be used.
In your case you would use it as follows:
IEnumerable<T1> table1 = ...
IEnumerable<T2> table2 = ...
// Full Outer Join table1 and table2
IEnumerable<MyResult> ResultTable = table1.FullOuterJoin(table2,
t1 => t1.Id, // from every element of table1 take the Id
t2 => t2.Id, // from every element of table2 also take the Id
// if you have a match t1, t2, (or default) with key, create a MyResult:
(t1, t2, key, MyResult) => new MyResult()
{
Id = key,
Table1Value = t1,
Table2Value = t2
});
As I use defaults for the last three parameters, not found elements will be null, and a default integer comparer is used.
Implementation is similar as the one above. As I might have several same values that match key, I'll use a lookup table.
// If no EqualityComparer is provided, use the default one:
cmp = cmp ?? EqualityComparer<TKey>.Default;
// create the two lookup tables:
ILookup<TKey, TA> alookup = sourceA.ToLookup(keySelectorA, cmp);
ILookup<TKey, TB> blookup = sourceB.ToLookup(keySelectorB, cmp);
// get a collection of all keys used in sourceA and/or sourceB.
// Remove duplicates using the equalityComparer
IEnumerable<TKey> allKeysUsedInA = sourceA.Select(a => keySelectorA(a));
IEnumerable<TKey> allKeysUsedInB = sourceB.Select(b => keySelectorB(b));
IEnumerable<TKey> allUsedKeys = allKeysUsedInA
.Concat(allKeysUsedInB)
.Distinct(cmp);
// now enumerate over all keys, get the matching elements from sourceA / sourceB
// use defaults if not available
foreach (TKey key in allUsedKeys)
{
// get all A elements with TKey, use the default value if the key is not found
IEnumerable<TA> foundAs = alookup[key].DefaultIfEmpty(defaultA);
foreach (TA foundA in foundAs)
{
// get all B elements with TKey, use the default value if the key is not found
IEnumerable<TB> foundBs = blookup[key].DefaultIfEmpty(defaultB);
foreach (TB foundB in foundBs)
{
TResult result = resultSelector(foundA, foundB, key);
yield return result;
}
}
}
Usage, you have a sequence of Cities, each with an ID and a Name; I also have a sequence of Streets, each with an ID and a Name. Every street belongs to one City with a foreign key CityId. Obviously a Street also has a Name
From every City I want the city name and the names of the Streets in the city.
The full outer join would be:
var result = Cities.FullOuterJoin(Streets,
city => city.Id, // from every city take the Id
street => street.CityId // from every street take the CityId
(city, street, matchingId) => new // when they match create a new object
{ // using the matching city, street and matching Id
CityName = city.Name, // with the name of the city
StreetName = street.Name, // and the name of the street
} // we don't use the matching id