3

As part of this question, it was pointed out repeatedly that I had an O(n^2) problem using code similar to this...

public class Foo
{
  public string IdentityValue {get;set;}  
  public string Prop1 {get;set;}
  public string Prop2 {get;set;}

}

List<Foo> itemSet1 = GenerateLargeItemSet(); //makes a large list, > 5000 items for example
List<Foo> itemSet2 = GenerateLargeItemSet();

foreach (var itemFromSet1 in itemSet1)
{

  //does a corresponding item exist in itemSet2?
  var itemSet2Item = itemSet2.FirstOrDefault(i => i.IdentityValue == itemFromSet1.IdentityValue);

  if (itemSet2Item != null)
  { 
    //do stuff to create item in the persistent store
  }
  else
  {
    //do stuff to update item in the persistent store
  }
}

Excusing the string comparison and parallelization considerations, is there a cheap and generic (objects may be type T, and the Identity property may be something else) way to reduce the O(n^2) nature of this?

Community
  • 1
  • 1
StingyJack
  • 19,041
  • 10
  • 63
  • 122

4 Answers4

6

One of solutions is to use Enumerable.Join method which have complextiy O(n)

List<Foo> itemSet1 = GenerateLargeItemSet(); //makes a large list, > 5000 items for example
List<Foo> itemSet2 = GenerateLargeItemSet();

// O(n)
var joinedSet = itemSet1.Join(itemSet2, s1 => s1.IdentityValue, s2 => s2.IdentityValue, (f1, f2) => f1).ToList();

// O(n)
foreach (var joinedItem in joinedSet)
{
    //do stuff to create item in the persistent store
}

// O(n)
var unjoinedSet = itemSet1.Except(joinedSet);

// O(n)
foreach (var unjoinedItem in unjoinedSet)
{
    //do stuff to update item in the persistent store
}
Community
  • 1
  • 1
Vadim Martynov
  • 8,602
  • 5
  • 31
  • 43
1

A well known way of improving the database queries speed is creating an index. The same principle can be applied here. But what is an index? It's just a data structure that allows fast searching. In BCL such structure is called Dictionary. So you can use something like this which will have O(N) time complexity

If the value is unique inside the set

var item2Index = itemSet2.ToDictionary(item => item.IdentityValue);

If not

var item2Index = itemSet2.GroupBy(e => e.IdentityValue)
    .ToDictionary(g => g.Key, g => g.First());

and then

foreach (var itemFromSet1 in itemSet1)
{
    //does a corresponding item exist in itemSet2?
    Foo itemSet2Item;
    if (!item2Index.TryGetValue(itemFromSet1.IdentityValue, out itemSet2Item))
    { 
        //do stuff to create item in the persistent store
    }
    else
    {
        //do stuff to update item in the persistent store
    }
}

If you want just to check for a duplicate item in the second set, but don't actually need the duplicate item, then you can use a simple HashSet (another BCL data structure for fast lookup)

var item2Keys = new HashSet<string>(itemSet2.Select(e => e.IdentityValue));
foreach (var itemFromSet1 in itemSet1)
{
    //does a corresponding item exist in itemSet2?
    if (!item2Keys.Contains(itemFromSet1.IdentityValue))
    { 
        //do stuff to create item in the persistent store
    }
    else
    {
        //do stuff to update item in the persistent store
    }
}
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
1

You can create Dictionary<TKey, TValue> first and then use its fast lookups it provides almost O(1) :

 List<Foo> itemSet1 = GenerateLargeItemSet(); //makes a large list, > 5000 items for example
Dictionary<string, Foo> itemSet2 = GenerateLargeItemSet().ToDictionary(i => i.IdentityValue);

//O(N)
foreach (var itemFromSet1 in itemSet1)
{
  //O(1)   
  if (!itemSet2.ContainsKey(itemFromSet1.IdentityValue))
  { 
     //do stuff to create item in the persistent store
  }
  else
  {
    //do stuff to update item in the persistent store
  }
}
Fabjan
  • 13,506
  • 4
  • 25
  • 52
0

You may find a performance improvement with the use of a HashSet.

        List<Foo> itemSet1 = GenerateLargeItemSet(); //makes a large list, > 5000 items for example
        HashSet<Foo> itemSet2 = new HashSet<Foo>(GenerateLargeItemSet());

        foreach (var itemFromSet1 in itemSet1)
        {
            //does a corresponding item exist in itemSet2?
            if (itemSet2.Contains(itemFromSet1))
            {
                //do stuff to update item in the persistent store
            }

            //do stuff to create item in the persistent store
        }
aydjay
  • 858
  • 11
  • 25
  • I am not comparing the objects. I am comparing a property value on the objects to determine if they are equal. – StingyJack Dec 23 '15 at 14:16
  • @StingyJack That just means you need to provide a comparer to the set, if the object hasn't defined equality based on that property. – Servy Dec 28 '15 at 15:15
  • @StingyJack Alternatively, you can just store the result of the selector in the set and perform the selector on the object before searching the set. Either way, trivial adaptation. – Servy Dec 28 '15 at 15:39