6

so basically I have two large lists like following:

public class Items
{
 public string ItemID { get; set; }
}


var oldList = new List<Items>(); // oldList

var newList = new List<Items>(); // new list

Both lists are very large, and a simple double foreach wouldn't be sufficient due to poor execution time if they are both large (more than 30 seconds).

In previous question that I've asked on stackoverflow I got a reply on how to compare these two same lists and find out which items have different QuantitySold parameter, and then store it in a third list called "DifferentQuantityItems" like following:

var differentQuantityItems =
    (from newItem in newList
     join oldItem in oldList on newItem.ItemID equals oldItem.ItemID
     where newItem.QuantitySold != oldItem.QuantitySold
     select newItem).ToList();

Now what I would like to get from these two lists is following:

- A list of items that are present in newList, but not in oldList

- A list of items that are present in oldList, but not in newList

How can I achieve this ? Can someone help me out?

P.S. The way I would "know" that either item is missing from one of the lists is by property "ItemID"...

codenheim
  • 20,467
  • 1
  • 59
  • 80
User987
  • 3,663
  • 15
  • 54
  • 115
  • Anyone guys ? =) – User987 Mar 13 '19 at 13:28
  • you can try `intersection` and `union` of `LINQ` https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.intersect?view=netframework-4.7.2 – Aarif Mar 13 '19 at 13:29
  • you can use `Equals` method with custom comparer class – Aarif Mar 13 '19 at 13:30
  • using `LINQ` won't promise any performance increase, take a look here https://stackoverflow.com/questions/11124797/c-sharp-how-to-handle-large-lists-of-data-quickly – Aarif Mar 13 '19 at 13:35
  • @Aarif ouh when I saw parallel loops mentioned in one of the replies I immediately ran away... I have bad experiences with parallel loops hahah =D – User987 Mar 13 '19 at 13:36
  • to get a clean looking code I'd suggest using `LINQ` (as mentioned earlier), you are facing problem with these lists taking too much time (not very clear from your question), you can spin up some parallel tasks to get it done faster or better write a cron job to do that. – Aarif Mar 13 '19 at 13:40
  • You didn't say whether the lists were sorted. That is important to know before choosing an approach. – codenheim Mar 13 '19 at 15:34

5 Answers5

4

Edited

Except will work much faster. Here you can read aboud it's perfomance

var missedOld = oldList.Except(newList, new ItemsEqualityComparer());
var oldList= oldList.Except(missedOld, new ItemsEqualityComparer());

Old answer

Two different lists with missing items

var missedOld = oldList.Where(x => !newList.Select(i => i.ItemID).Contains(x.ItemID)) 
var missedNew = newList.Where(x => !oldList.Select(i => i.ItemID).Contains(x.ItemID))

All missed items in one list:

oldList.Concat(newList).GroupBy(x => x.ItemID).Where(x => x.Count() < 2).Select(x => x.Value).ToList()
Akmal Salikhov
  • 818
  • 3
  • 12
  • 25
  • Shouldn't you create a new variable and store the new values in it ? – User987 Mar 13 '19 at 13:38
  • P.S. I'd like to have the missing items from both lists separated in two different lists... =d – User987 Mar 13 '19 at 13:39
  • @User987 Yes, of course you shoul put the value of this expression into new variable and use it than – Akmal Salikhov Mar 13 '19 at 13:40
  • does this gets items missing from old or new list ? :D – User987 Mar 13 '19 at 13:41
  • @User987 ok, then just use !Contains. I'll edit my answer – Akmal Salikhov Mar 13 '19 at 13:42
  • Akmal, looks good ,but how does the "Contains" works exactly? Consider that I have more than 1 property inside this class... How will it know based on what it will compare the item ? =/ (It should compare them based on the ItemID – User987 Mar 13 '19 at 13:46
  • 1
    @User987 You can compare by properties like that too: ```var missedOld = oldList.Where(x => !newList.Select(l => l.ItemID).Contains(x.ItemID))``` – Akmal Salikhov Mar 13 '19 at 13:50
  • Yes thats it :) ... Can you post that reply so that the others can see as well ? – User987 Mar 13 '19 at 13:54
  • I don't believe List.Contains() is any better than the nested loops. Its O(N) - so nested loops will be O(n^2) - the ICollection implementation of List isn't O(1) or O(log N) which is what you want in this case, else its the same solution he is trying to avoid. – codenheim Mar 13 '19 at 15:43
  • @codenheim You absolutely right! Thank you for your thoughtfulness. I will edit my answer. – Akmal Salikhov Mar 14 '19 at 10:33
2

Have you considered converting your lists to hashsets and using the Except method ?

See Difference between two lists

And:Is there a way to get the difference between two sets of objects in c#

Pierre
  • 163
  • 1
  • 9
  • Looks good, could you show me an example if possible please? :) – User987 Mar 13 '19 at 13:35
  • 1
    Agree with trying Except(), because it should be an O(1) or O(log N) complexity due to internal use of HashSet (although I am not 100%, but in testing it seems so) – codenheim Mar 13 '19 at 15:44
1
var items = new List<int>(oldList.Select(x => x.ItemID ));
var missingValues = newList.Where(x => !diffids.Contains(x.ItemID)).ToList();

You can also use except.

Kadir
  • 3,094
  • 4
  • 37
  • 57
  • Hm this one is a bit confusing to me... Can I extract the items in form of class ? Not just the integer value.. :D – User987 Mar 13 '19 at 13:33
0

If the lists are large enough that nested loops take 30 seconds, I recommend you put each list's items into a respective HashSet and use that to find the exceptions. Hash tables will scale at O(1) or O(log N) whereas comparing 2 unsorted lists is O(n^2).

That said, try using Linq Except()

var notinNewList = oldList.Except(newList);

If I am not mistaken, the internal implementation of .Except() relies on HashSets

Secondly, if the lists are sorted, or can be pre-sorted, then you can do a linear pass in 1 go without a nested loop probably faster than any approach.

I don't recommend using List.Contains() because it is a linear implementation, which will result in the same O(n^2) that you are trying to avoid, albeit it just looks prettier due to the Linq syntactical sugar.

codenheim
  • 20,467
  • 1
  • 59
  • 80
-1
var items = newList.Where(n => !oldlist.Any(o => o.ItemID == n.ItemID)).ToList();

Its more agile since you do not need to go to DB again and do not use the Contains which is like a like in SQL and it is also in a line..

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
J4ime
  • 145
  • 6