I'm pulling data from two sources with a common ID. One set of data has metadata while the other does not. I want to end up with one list that has the common information.
public class Record
{
public string Id { get; set; }
public string Name { get; set; }
public string Title { get; set; }
public string MetaInfo1 { get; set; }
public string MetaInfo2 { get; set; }
}
List<Record> doc = new List<Record>(); //About 100k items, MetaInfo is null
List<Record> docWithMeta = new List<Record>(); //About 50k items, Name and Title Null
I've tried using Join
, but the 2nd dataset doesn't always have a matching ID and the end result is a List that only contains items that had a match. It's okay that the end result should have records with missing metadata.
var joint = doc.Join(docWithMeta,
a => a.Id,
b => b.Id,
(a, b) => new Record
{
Id = a.Id,
Name = a.Name,
Title = a.Title,
MetaInfo1 = b.MetaInfo1,
MetaInfo2 = b.MetaInfo2,
}).ToList();
I tried using nested foreach
loops, to find a match and add the properties to a new list, which works, but the code was very slow.
List<Record> newDoc = new List<Record>();
foreach (Record rec in doc)
{
foreach (Record recMeta in docWithMeta)
{
if (rec.Id == recMeta.Id)
{
rec.MetaInfo1 = recMeta.MetaInfo1;
rec.MetaInfo1 = recMeta.MetaInfo1;
}
}
newDoc.Add(rec);
}
I also tried using GroupJoin
, but I'm not exactly sure how to use it and I keep getting a null exception.
var results = doc.GroupJoin(docWithMeta,
a => a.Id,
b => b.Id,
(a, result) => new Record
{
Id = a.Id,
MetaInfo1 = result.FirstOrDefault().MetaInfo1 //null exception here
}).ToList();
UPDATE
Using some of the suggestions below I got an adequately performing method that works.
var results = doc.GroupJoin(docWithMeta,
a => a.Id,
b => b.Id,
(a, result) => new
{
Foo = f,
Bar = result }
}).SelectMany(
x => x.Bar.DefaultIfEmpty(),
(x, y) => new Record
{
Id = x.Foo.Id,
Name = x.Foo.Name,
MetaInfo1 = y == null ? null : y.MetaInfo1,
MetaInfo2 = y == null ? null : y.MetaInfo2
}).ToList();
I kept getting a NullReferenceException whenever the dataset with metadata didn't have an Id that matched the first data set. I just used a ternary operator to check for null
. There must be a better way.