1

I'm pulling data from two sources with a common ID. One set of data has metadata while the other does not. I want to end up with one list that has the common information.

public class Record
{
    public string Id { get; set; }
    public string Name { get; set; }
    public string Title { get; set; }
    public string MetaInfo1 { get; set; }
    public string MetaInfo2 { get; set; }

}
List<Record> doc = new List<Record>(); //About 100k items, MetaInfo is null

List<Record> docWithMeta = new List<Record>(); //About 50k items, Name and Title Null

I've tried using Join, but the 2nd dataset doesn't always have a matching ID and the end result is a List that only contains items that had a match. It's okay that the end result should have records with missing metadata.

var joint = doc.Join(docWithMeta,
            a => a.Id,
            b => b.Id,
            (a, b) => new Record
            {
                Id = a.Id,
                Name = a.Name,
                Title = a.Title,                        
                MetaInfo1 = b.MetaInfo1,
                MetaInfo2 = b.MetaInfo2,
            }).ToList();

I tried using nested foreach loops, to find a match and add the properties to a new list, which works, but the code was very slow.

List<Record> newDoc = new List<Record>();

foreach (Record rec in doc)
{
   foreach (Record recMeta in docWithMeta)
   {
      if (rec.Id == recMeta.Id)
      {
         rec.MetaInfo1 = recMeta.MetaInfo1;
         rec.MetaInfo1 = recMeta.MetaInfo1;
      }
   }
   newDoc.Add(rec);
}

I also tried using GroupJoin, but I'm not exactly sure how to use it and I keep getting a null exception.

var results = doc.GroupJoin(docWithMeta,
              a => a.Id,
              b => b.Id,
              (a, result) => new Record
              { 
                 Id = a.Id,
                 MetaInfo1 = result.FirstOrDefault().MetaInfo1 //null exception here
              }).ToList();

UPDATE

Using some of the suggestions below I got an adequately performing method that works.

var results = doc.GroupJoin(docWithMeta,
           a => a.Id,
           b => b.Id,
           (a, result) => new 
           { 
             Foo = f,
             Bar = result }      
           }).SelectMany(
              x => x.Bar.DefaultIfEmpty(),
              (x, y) => new Record
              {
                 Id = x.Foo.Id,
                 Name = x.Foo.Name,
                 MetaInfo1 = y == null ? null : y.MetaInfo1,
                 MetaInfo2 = y == null ? null : y.MetaInfo2
              }).ToList();

I kept getting a NullReferenceException whenever the dataset with metadata didn't have an Id that matched the first data set. I just used a ternary operator to check for null. There must be a better way.

SharpBarb
  • 1,590
  • 3
  • 16
  • 40
  • 1
    I think this example will help - The key bit is the DefaultOrEmpty() call - https://stackoverflow.com/questions/584820/how-do-you-perform-a-left-outer-join-using-linq-extension-methods – Sam Apr 18 '19 at 17:18
  • Use left outer join : https://code.msdn.microsoft.com/101-LINQ-Samples-3fb9811b – jdweng Apr 18 '19 at 17:19
  • Thank's Sam. Using your example helped me move in the right direction, but I kept getting a NullReferrenceException whenever there wasn't a matching record in the metadata set. – SharpBarb Apr 18 '19 at 20:38

1 Answers1

1

I can not check this code now, but i think it must be work

doc.GroupJoin(
      docWithMeta,
      a => a.Id,
      b => b.Id,
      (a, b) => new { doc = a, meta = b })
  .SelectMany(
      ab => ab.docWithMeta.DefaultIfEmpty(),
      (x, y) => new { doc = x.doc, meta = y })
  .Select(s => new
  {
      Id = s.doc.Id,
      Name = s.doc.Name,
      Title = s.doc.Title,                        
      MetaInfo1 = s.meta?.MetaInfo1 == null ? "" : s.meta?.MetaInfo1,
      MetaInfo2 = s.meta?.MetaInfo2 == null ? "" : s.meta?.MetaInfo2,  
  }).ToList();
evilGenius
  • 1,041
  • 1
  • 7
  • 16