1

I have a list of AE_AlignedPartners items in the db, which I retrieve with:

List<AE_AlignedPartners> ae_alignedPartners_olds = ctx.AE_AlignedPartners.AsNoTracking().ToList();

Than, I got and serialize a new list (of the same object type) with JSON:

List<AE_AlignedPartners> ae_alignedPartners_news = GetJSONPartnersList();

Than I'm getting the intersections of both:

var IDSIntersections = (from itemNew in ae_alignedPartners_news
                        join itemOld in ae_alignedPartners_olds on itemNew.ObjectID equals itemOld.ObjectID
                        select itemNew).Select(p => p.ObjectID).ToList();

Now, due of these intersections, I need to check if some records has been changed, checking many fields, and than add to a "monitoring" list of updates. Here's the code:

IList<AE_AlignedPartners> ae_alignedPartners_toUpdate = new List<AE_AlignedPartners>();
foreach (var item in IDSIntersections)
{
    var itemOld = ae_alignedPartners_olds.First(p => p.ObjectID == item);
    var itemNew = ae_alignedPartners_news.First(p => p.ObjectID == item);

    if (itemOld.Field1 != itemNew.Field1 ||
        itemOld.Field2 != itemNew.Field2 ||
        itemOld.Field3 != itemNew.Field3 ||
        itemOld.Field4 != itemNew.Field4 ||
        itemOld.Field5 != itemNew.Field5 ||
        itemOld.Field6 != itemNew.Field6 ||
        itemOld.Field7 != itemNew.Field7 ||
        itemOld.Field8 != itemNew.Field8 ||
        itemOld.Field9 != itemNew.Field9)
    {
        AE_AlignedPartners toUpdate = mapper.Map<AE_AlignedPartners, AE_AlignedPartners>(itemNew);
        toUpdate.ID = itemOld.ID;

        ae_alignedPartners_toUpdate.Add(toUpdate);
    }
}

Which is extremely slow (~4 minutes in release with ~70k records).

Recently I've discovered here the IEqualityComparer, which really speed up the procession of comparisons.

Can I take advantage of it in this case? Or which are a valid optimization?

I wouldn't use the suggested FullOuterJoin, since this will mean lots of refactoring right now (I'll do it in the next project, promise).

Any tips? Thanks

markzzz
  • 47,390
  • 120
  • 299
  • 507
  • 2
    The assumption that it's the equality comparisons specifically that cause the perf problem seems unwarranted. Have you profiled it to see what's actually taking the most time? It could be anything from the acual DB I/O to excessive memory allocations to the repeated `.First()` calls doing linear searches over large collections -- the comparisons would be way down on my list of likely candidates. – Jeroen Mostert Feb 01 '19 at 08:44
  • 2
    You've imeplemnted *nested loops* `foreach (...) {..First(...)}`: and so you have `O(N ** 2)` time complexity `70k * 70k ~ 5e9` loops which is slow. Let's turn `ae_alignedPartners_olds` and `ae_alignedPartners_news` into a *dictionary* with `ObjectID` key – Dmitry Bychenko Feb 01 '19 at 08:46
  • @DmitryBychenko: can you show an example? – markzzz Feb 01 '19 at 08:49
  • `ObjectID` is an `int`? – Tim Schmelter Feb 01 '19 at 08:51
  • @Rango: no its a string – markzzz Feb 01 '19 at 09:01

2 Answers2

3

You have nested loops implementation

// O(N) : Loop over IDSIntersections
foreach (var item in IDSIntersections)
{
    // O(N) : Again, loop over ae_alignedPartners_olds
    var itemOld = ae_alignedPartners_olds.First(p => p.ObjectID == item);
    var itemNew = ae_alignedPartners_news.First(p => p.ObjectID == item);
    ...

In the worst case you are going to have O(N) * O(N) = O(N**2) time complexity; billions of loops: 70k * 70k ~ 5e9. Let's get rid of inner loops with a help of dictionaries:

// O(N)
var dictOld = ae_alignedPartners_olds
  .GroupBy(p => p.ObjectID) // ObjectID should be a int, string or provide good GetHashCode()
  .ToDictionary(chunk => chunk.Key,
                chunk => chunk.First());

// O(N)
var dictNew = ae_alignedPartners_news
  .GroupBy(p => p.ObjectID) 
  .ToDictionary(chunk => chunk.Key,
                chunk => chunk.First());

// O(N)
foreach (var item in IDSIntersections)
{
   // O(1) : no loops when finding value by key in dictionary
   var itemOld = dictOld[item];      
   var itemNew = dictNew[item];
   ... 

Now we are going to have about 3 * O(N) loops: 3 * 70k ~ 2e5

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
1

A custom IEqualityComparer<AE_AlignedPartners> would be nice, but not because it improves performance, it needs to do the same comparison. But because encapsulating the logic there makes it more maintainable, readable and reusable. You could use it for many LINQ methods.

What is slow is that you always search the old and the new item via ObjectId in the foreach-loop.

You don't need to select the common ObjectID of both, if you have already joined the old and the new, just store the whole instances in an anonymous type:

var intersections = from itemNew in ae_alignedPartners_news
                    join itemOld in ae_alignedPartners_olds on itemNew.ObjectID equals itemOld.ObjectID
                    select new { New = itemNew, Old = itemOld };

foreach(var x in intersections)
{
    var itemOld = x.Old;
    var itemNew = x.New;
    // ... 
}
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939