1

I have two arrays of ArrayList.

public class ProductDetails
{
    public string id;
    public string description;
    public float rate;
}

ArrayList products1 = new ArrayList();
ArrayList products2 = new ArrayList();
ArrayList duplicateProducts = new ArrayList();

Now what I want is to get all the products (with all the fields of ProductDetails class) having duplicate description in both products1 and products2.

I can run two for/while loops as traditional way, but that would be very slow specially if I will be having over 10k elements in both arrays.

So probably something can be done with LINQ.

  • 2
    Use `List`, not `ArrayList`. – SLaks Sep 12 '16 at 19:14
  • 2
    You should probably use a database. – SLaks Sep 12 '16 at 19:15
  • 2
    I agree with @SLaks it would be much easier to do this using a SubQuery with the outter Select being `Select Distinct` and the `Inner Select doing your Group By Having Count(*) >= 2` – MethodMan Sep 12 '16 at 19:17
  • A "textbook" implementation is to sort both sequences (`O(n lg n)` if needing a comparison ordering) and then walk the sequences together. This is one of the several algorithms used by databases. – user2864740 Sep 12 '16 at 19:19
  • @user2864740: You mean `O(n lg n)`. Or make a dictionary and do it in `O(n)`. – SLaks Sep 12 '16 at 19:22
  • @SLaks Yeah, I just noticed/corrected the goof :} The intent of the comment was to stay within the realm of 'loops'. – user2864740 Sep 12 '16 at 19:22
  • I am already using SQLite. One list is in database and one list I have imported from Excel Spreadsheet. I am pretty new to programming, so just not getting it how to do it in a effective and quick way. @MethodMan can u pls guide me more with giving some example of the query. –  Sep 12 '16 at 19:22
  • I know I can use where clause in SQL query but how to do it if I am having so many multiple values for the where clause which is in the second array. –  Sep 12 '16 at 19:30

3 Answers3

1

If you want to use linQ, you need write your own EqualityComparer where you override both methods Equals and GetHashCode()

 public class ProductDetails
    { 
        public string id {get; set;}
        public string description {get; set;}
        public float rate {get; set;}
    }

public class ProductComparer : IEqualityComparer<ProductDetails>
{

    public bool Equals(ProductDetails x, ProductDetails y)
    {
        //Check whether the objects are the same object. 
        if (Object.ReferenceEquals(x, y)) return true;

        //Check whether the products' properties are equal. 
        return x != null && y != null && x.id.Equals(y.id) && x.description.Equals(y.description);
    }

    public int GetHashCode(ProductDetails obj)
    {
        //Get hash code for the description field if it is not null. 
        int hashProductDesc = obj.description == null ? 0 : obj.description.GetHashCode();

        //Get hash code for the idfield. 
        int hashProductId = obj.id.GetHashCode();

        //Calculate the hash code for the product. 
        return hashProductDesc ^ hashProductId ;
    }
}

Now, supposing you have this objects:

ProductDetails [] items1= { new ProductDetails { description= "aa", id= 9, rating=2.0f }, 
                       new ProductDetails { description= "b", id= 4, rating=2.0f} };

ProductDetails [] items= { new ProductDetails { description= "aa", id= 9, rating=1.0f }, 
                       new ProductDetails { description= "c", id= 12, rating=2.0f } };


IEnumerable<ProductDetails> duplicates =
    items1.Intersect(items2, new ProductComparer());
raven
  • 2,381
  • 2
  • 20
  • 44
  • Thanks for the so well commented code. I will surely try this code but truly speaking I am not getting about the `GetHashCode` stuff. What if I have couple of more fields in my custom class? Do I need to `^` it with same pattern? Like `hasProductDesc ^ hasProductId ^ hasProductRate ^ hasProductMake`? Also what if I only want to compare these two different objects/arraylist based on description, means other fields can be different. –  Sep 12 '16 at 19:44
  • Hi, If you have multiples properties I wouldn't do it because if you XOR many properties eventually they might converge to 0 causing lots of collisions. Hence, destroying the Equals method. Read this for further information. [GetHashCode override of object containing generic array](http://stackoverflow.com/questions/638761/gethashcode-override-of-object-containing-generic-array/639098#639098) – raven Sep 12 '16 at 19:51
  • In the code above, I assumed that the equality is defined by the id and description. If it is only by the description you can remove this `x.id.Equals(y.id) ` and this `^ hashProductId` and it should work fine. – raven Sep 12 '16 at 19:54
0

Consider overriding the System.Object.Equals method.

   public class ProductDetails
   {
     public string id;
     public string description;
     public float rate;

     public override bool Equals(object obj)
     {
       if(obj is ProductDetails == null)
          return false;

      if(ReferenceEquals(obj,this))
          return true;

       ProductDetails p = (ProductDetails)obj;
       return description == p.description;
    }
  }

Filtering would then be as simple as:

var result = products1.Where(product=>products2.Contains(product));

EDIT:

Do consider that this implementation is not optimal..

Moreover- it has been proposed in the comments to your question that you use a data base.
This way performance will be optimized - as per the database implementation
In any case- the overhead will not be yours.

However, you can optimize this code by using a Dictionary or a HashSet:
Overload the System.Object.GetHashCode method:

public override int GetHashCode()
{
  return description.GetHashCode();
}

You can now do this:

var hashSet = new HashSet<ProductDetails>(products1);
var result = products2.Where(product=>hashSet.Contains(product));

Which will boost your performance to an extent since lookup will be less costly.

Eyal Perry
  • 2,255
  • 18
  • 26
0

10k elements is nothing, however make sure you use proper collection types. ArrayList is long deprecated, use List<ProductDetails>.

Next step is implementing proper Equals and GetHashCode overrides for your class. The assumption here is that description is the key since that's what you care about from a duplication point of view:

public class ProductDetails
{
    public string id;
    public string description;
    public float rate;

    public override bool Equals(object obj)
    {
        var p = obj as ProductDetails;
        return ReferenceEquals(p, null) ? false : description == obj.description;
    }

    public override int GetHashCode() => description.GetHashCode();    
}

Now we have options. One easy and efficient way of doing this is using a hash set:

var set = new HashSet<ProductDetails>();
var products1 = new List<ProductDetails>();  // fill it
var products2 = new List<ProductDetails>();  // fill it

// shove everything in the first list in the set
foreach(var item in products1)
    set.Add(item);

// and simply test the elements in the second set
foreach(var item in products2)
    if(set.Contains(item))
    {
        // item.description was already used in products1, handle it here
    }

This gives you linear (O(n)) time-complexity, best you can get.

Blindy
  • 65,249
  • 10
  • 91
  • 131