2

I have a list of AE_AlignedPartners items in the db, which I retrieve with:

List<AE_AlignedPartners> ae_alignedPartners_olds = ctx.AE_AlignedPartners.AsNoTracking().ToList();

Than, I got and serialize a new list (of the same object type) with JSON:

List<AE_AlignedPartners> ae_alignedPartners_news = GetJSONPartnersList();

Than I'm getting the intersections of both:

var IDSIntersections = (from itemNew in ae_alignedPartners_news
                        join itemOld in ae_alignedPartners_olds on itemNew.ObjectID equals itemOld.ObjectID
                        select itemNew).Select(p => p.ObjectID).ToList();

Now, due of these intersections, I need to create two new lists, with the added items (ae_alignedPartners_news - intersections) and the deleted ones (ae_alignedPartners_olds - interesections). Here's the code:

// to create
IList<AE_AlignedPartners> ae_alignedPartners_toCreate = ae_alignedPartners_news.Where(p => !IDSIntersections.Contains(p.ObjectID)).ToList();

// to delete
IList<AE_AlignedPartners> ae_alignedPartners_toDelete = ae_alignedPartners_olds.Where(p => !IDSIntersections.Contains(p.ObjectID)).ToList();

But with many records (~100k) it tooks too much time.

Is there a sort of Except<> specifing which key need to be compared? In my case its not p.ID (which is the Primary Key on the DB), but p.ObjectID.

Or any other faster way?

Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
markzzz
  • 47,390
  • 120
  • 299
  • 507
  • Use left outer Join which will give a null when left item is null but right item has value. Normal join only give the items where the left is not null.https://code.msdn.microsoft.com/101-LINQ-Samples-3fb9811b – jdweng Jan 31 '19 at 11:01
  • Store the new data in the database, in a staging/temporary table with proper indexes. 100K rows is no data for a database even if it's a lot of data for a single list – Panagiotis Kanavos Jan 31 '19 at 11:23
  • @Panagiotis Kanavos: the objects I got from JSON is not on DB, so can't use directly SQl. Do you mean load on SQL on temp table and do comparison? – markzzz Jan 31 '19 at 11:23
  • @markzzz yes. The database can use different join strategies based on the actual data and parallelize the query while you have to pick one without knowing what's actually going on. For example, you could load the data into `HashSet`s to perform fast set operations like `Except`. That's still a sequential operation though that can't be easily parallelized. – Panagiotis Kanavos Jan 31 '19 at 11:26
  • @markzzz you can try the HashSet option anyway. It will be a lot faster than `Except` on two lists without any indexing. BTW, what do you intend to do with the data? If it's used to update the database, you could store the data into a staging table and use outer joins to insert/delete rows that appear in only one side – Panagiotis Kanavos Jan 31 '19 at 11:30
  • @PanagiotisKanavos can you make an example with HasSet? – markzzz Jan 31 '19 at 11:31
  • Sounds like you need to to an full outer join and check for left or right for null. (insert or delete) I believe moreLinq has a full outer join extension. – Magnus Jan 31 '19 at 11:49
  • Perhaps my [full outer join](https://stackoverflow.com/a/43669055/2557128) for LINQ to SQL would help? There is another answer in the same question for my implementation for EF. – NetMage Jan 31 '19 at 18:48

2 Answers2

3

There is an Except function that you can use with a custom comparer:

    class PartnerComparer : IEqualityComparer<AE_AlignedPartners>
    {
        // Partners are equal if their ObjectID's are equal.
        public bool Equals(AE_AlignedPartners x, AE_AlignedPartners y)
        {         
            //Check whether the partner's ObjectID's are equal.
            return x.ObjectID == y.ObjectID;
        }

        public int GetHashCode(AE_AlignedPartners ap) {
            return ap.ObjectId.GetHashCode();
        }
    }

   var intersect = ae_alignedPartners_news.Intersect(ae_alignedPartners_olds);
   var creates = ae_alignedPartners_news.Except(intersect, new PartnerComparer);
   var deletes = ae_alignedPartners_old.Except(intersect, new PartnerComparer);

This should give you a reasonable boost in performance.

Murray Foxcroft
  • 12,785
  • 7
  • 58
  • 86
1

You don't need an inner join, you need a full outer join on primary key. LINQ does not know a full outer join, but it is easy to extend IEnumerable with a function.

from StackOverlow: LINQ full outer join, I took the solution that uses deferred execution. This solution only works if the KeySelector uses unique keys.

    public static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
        this IEnumerable<TA> sequenceA,
        IEnumerable<TB> sequenceB,
        Func<TA, TKey> keyASelector, 
        Func<TB, TKey> keyBSelector,
        Func<TKey, TA, TB, TResult> resultSelector,
        IEqualityComparer<TKey> comparer)
{
    if (comparer == null) comparer = EqualityComparer<TKey>.Default;

    // create two lookup tables:
    var alookup = a.ToLookup(selectKeyA, comparer);
    var blookup = b.ToLookup(selectKeyB, comparer);

    // all used keys:
    var aKeys = alookup.Select(p => p.Key);
    var bKeys = blookup.Select(p => p.Key);
    var allUsedKeys = aKeys.bKeys.Distinct(comparer);

    // for every used key:
    // get the values from A with this key, or default if it is not a key used by A
    // and the value from B with this key, or default if it is not a key used by B
    // put the key, and the fetched values in the ResultSelector
    foreach (TKey key in allUsedKeys)
    {
        TA fetchedA = aLookup[key].FirstOrDefault();
        TB fetchedB = bLookup[key].FirstOrDefault();
        TResult result = ResultSelector(key, fetchedA, fetchedB);
        yield result;
    }

I use this function to create three types:

  • Values in A but not in B: (A, null) => must be added
  • Values in B but not in A: (null, B) => must be removed
  • Values in A and in B: (A, B) => need further inspection to see if update is needed

.

IEnumerable<AlignedPartners> olds = ...
IEnumerable<AlignedPartners> news = ...

var joinResult = olds.FullOuterJoin(news, // join old and new
    oldItem => oldItem.Id,                // from every old take the Id
    newItem => newItem.Id,                // from every new take the Id
    (key, oldItem, newItem) => new        // when they match make one new object
    {                                     // containing the following properties
         OldItem = oldItem,
         NewItem = newItem,
    });

Note: until now nothing has been enumerated!

foreach (var joinedItem in joinResult)
{
    if (joinedItem.OldItem == null)
    {
        // we won't have both items null, so we know NewItem is not null
        AddItem(joinedItem.NewItem);
    }
    else if (joinedItem.NewItem == null)
    {   // old not null, new equals null
        DeleteItem(joinedItem.OldItem);
    }
    else
    {  // both old and new not null, if desired: check if update needed
        if (!comparer.Equals(old, new))
        {   // changed
            UpdateItems(old, new)
        }
    }
}
Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116