1

I'm looking to group a list based on a list within that list itself, given the following data structure:

public class AccDocumentItem
{
   public string AccountId {get;set;}
   public List<AccDocumentItemDetail> DocumentItemDetails {get;set;}
}

And

public class AccDocumentItemDetail
{
   public int LevelId {get;set;}
   public int DetailAccountId {get;set;}
}

I now have a List<AccDocumentItem> comprised of 15 items, each of those items has a list with variable number of AccDocumentItemDetail's, the problem is that there may be AccDocumentItems that have identical AccDocumentItemDetails, so I need to group my List<AccDocumentItem> by it's AccDocumentItemDetail list.

To make it clearer, suppose the first 3 (of the 15) elements within my List<AccDocumentItem> list are:

1:{
   AccountId: "7102",  
   DocumentItemDetails:[{4,40001},{5,40003}]
  }
2:{
   AccountId: "7102",
   DocumentItemDetails:[{4,40001},{6,83003},{7,23423}]
  }
3:{
   AccountId: "7102",
   DocumentItemDetails:[{4,40001},{5,40003}]
  }

How can I group my List<AccDocumentItem> by it's DocumentItemDetails list such that row 1 and 3 are in their own group, and row 2 is in another group?

Thanks.

Mohammad Sepahvand
  • 17,364
  • 22
  • 81
  • 122

1 Answers1

3

You could group by the comma separated string of detail-ID's:

var query = documentItemList
 .GroupBy(aci => new{ 
     aci.AccountId, 
     detailIDs = string.Join(",", aci.DocumentItemDetails
                                     .OrderBy(did => did.DetailAccountId)
                                     .Select(did => did.DetailAccountId))
 });

Another, more ( elegant,efficient,maintainable ) approach is to create a custom IEqualityComparer<AccDocumentItem>:

public class AccDocumentItemComparer : IEqualityComparer<AccDocumentItem>
{
    public bool Equals(AccDocumentItem x, AccDocumentItem y)
    {
        if (x == null || y == null)
            return false;
        if (object.ReferenceEquals(x, y))
            return true;
        if (x.AccountId != y.AccountId)
            return false;
        return x.DocumentItemDetails
                .Select(d => d.DetailAccountId).OrderBy(i => i)
                .SequenceEqual(y.DocumentItemDetails
                                .Select(d => d.DetailAccountId).OrderBy(i => i));
    }

    public int GetHashCode(AccDocumentItem obj)
    {
        if (obj == null) return int.MinValue;
        int hash = obj.AccountId.GetHashCode();
        if (obj.DocumentItemDetails == null)
            return hash;
        int detailHash = 0;
        unchecked
        {
            foreach (var detID in obj.DocumentItemDetails.Select(d => d.DetailAccountId))
                detailHash = detailHash * 23 + detID;
        }
        return hash + detailHash;
    }
}

Now you can use it for GroupBy:

var query = documentItemList.GroupBy(aci => aci, new AccDocumentItemComparer());

You can use that for many other Linq extension methods like Enumerable.Join etc. also.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • 2
    maybe you need to sort `DetailAccountId`s? I think, order here doesn't matter at all. – Ilya Ivanov Sep 30 '13 at 08:00
  • @TimSchmelter how would the `GetHashCode()` differ if my `DetailAccountId` was a string? – Mohammad Sepahvand Sep 30 '13 at 08:20
  • @TimSchmelter My DetailAccountId is a string, so I changed the last line of your GetHashCode method to: `hash + obj.AccDocumentItemDetails.Sum(d => d.DetailAccountId.GetHashCode());` Is that correct? Thanks a million btw. – Mohammad Sepahvand Sep 30 '13 at 08:25
  • @MohammadSepahvand: i would suggest to change both your ID properties to be of type `int` instead. However, if you insist on strings you could replace the `.Sum(d => 17 + d.DetailAccountId)` with `.Sum(d => 17 + d.DetailAccountId.GetHashCode())` (i've added `+ 17` to avoid unnecessary hashcode collisions) – Tim Schmelter Sep 30 '13 at 08:26
  • @TimSchmelter, where did 17 come from? – Mohammad Sepahvand Sep 30 '13 at 08:27
  • @MohammadSepahvand: That's just an arbitrary prime to avoid collisions in GetHashCode. See http://stackoverflow.com/a/263416/284240 I've edited my answer again to provide a better `Equals` and a better hashcode. – Tim Schmelter Sep 30 '13 at 08:44
  • @EliArbel: `XOR` is not the best way to calculate a hashcode based o multiple items. Have a look: http://stackoverflow.com/a/263416/284240 – Tim Schmelter Sep 30 '13 at 08:50
  • @TimSchmelter just 1 more query, my `DetailAccountId`s are strings and I have no choice but to stick with them, when I changed your `GetHash()` code to`.Sum(d => detailHash * 17 + d.DetailAccountId.GetHashCode());` I keep getting an arithmetic overflow exception. – Mohammad Sepahvand Sep 30 '13 at 08:51
  • @MohammadSepahvand: Forget the prime in `GetHashCode` if you don't want to take the order into account. Or order the details by id before you start calculating the hashcode: `foreach (var detID in obj.DocumentItemDetails .Select(d => d.DetailAccountId).OrderBy(i => i)) detailHash = 17 * detailHash + detID;` .According to your overflow exception, have you noticed my `unchecked`? – Tim Schmelter Sep 30 '13 at 08:57