2

I hope someone is able to help me with what is, at least to me, quite a tricky algorithm.

The Problem

I have a List (1 <= size <= 5, but size unknown until run-time) of Lists (1 <= size <= 2) that I need to combine. Here is an example of what I am looking at:-

ListOfLists = { {1}, {2,3}, {2,3}, {4}, {2,3} }

So, there are 2 stages to what I need to do:-

(1). I need to combine the inner lists in such a way that any combination has exactly ONE item from each list, that is, the possible combinations in the result set here would be:-

  • 1,2,2,4,2
  • 1,2,2,4,3
  • 1,2,3,4,2
  • 1,2,3,4,3
  • 1,3,2,4,2
  • 1,3,2,4,3
  • 1,3,3,4,2
  • 1,3,3,4,3

The Cartesian Product takes care of this, so stage 1 is done.....now, here comes the twist which I can't figure out - at least I can't figure out a LINQ way of doing it (I am still a LINQ noob).

(2). I now need to filter out any duplicate results from this Cartesian Product. A duplicate in this case constitutes any line in the result set with the same quantity of each distinct list element as another line, that is,

1,2,2,4,3 is the "same" as 1,3,2,4,2

because each distinct item within the first list occurs the same number of times in both lists (1 occurs once in each list, 2 appears twice in each list, ....

The final result set should therefore look like this...

  • 1,2,2,4,2
  • 1,2,2,4,3
  • --
  • 1,2,3,4,3
  • --
  • --
  • --
  • 1,3,3,4,3

Another example is the worst-case scenario (from a combination point of view) where the ListOfLists is {{2,3}, {2,3}, {2,3}, {2,3}, {2,3}}, i.e. a list containing inner lists of the maximum size - in this case there would obviously be 32 results in the Cartesian Product result-set, but the pruned result-set that I am trying to get at would just be:-

  • 2,2,2,2,2
  • 2,2,2,2,3 <-- all other results with four 2's and one 3 (in any order) are suppressed
  • 2,2,2,3,3 <-- all other results with three 2's and two 3's are suppressed, etc
  • 2,2,3,3,3
  • 2,3,3,3,3
  • 3,3,3,3,3

To any mathematically-minded folks out there - I hope you can help. I have actually got a working solution to part 2, but it is a total hack and is computationally-intensive, and I am looking for guidance in finding a more elegant, and efficient LINQ solution to the issue of pruning.

Thanks for reading.

pip

Some resources used so far (to get the Cartesian Product)

UPDATE - The Solution

Apologies for not posting this sooner...see below

Community
  • 1
  • 1
phil
  • 879
  • 1
  • 10
  • 20

3 Answers3

3

You should implement your own IEqualityComparer<IEnumerable<int>> and then use that in Distinct().

The choice of hash code in the IEqualityComparer depends on your actual data, but I think something like this should be adequate if your actual data resemble those in your examples:

class UnorderedQeuenceComparer : IEqualityComparer<IEnumerable<int>>
{
    public bool Equals(IEnumerable<int> x, IEnumerable<int> y)
    {
        return x.OrderBy(i => i).SequenceEqual(y.OrderBy(i => i));
    }

    public int GetHashCode(IEnumerable<int> obj)
    {
        return obj.Sum(i => i * i);
    }
}

The important part is that GetHashCode() should be O(N), sorting would be too slow.

svick
  • 236,525
  • 50
  • 385
  • 514
  • What do you mean? My code doesn't produce the right results? If you mean that it doesn't work with the `OrderBy()`s in `Equals()`, then yeah, that's why they're there. – svick Aug 04 '11 at 21:58
  • It produces the right results, it's ok. I actually didn't read the code paying attention, I just read your comment saying "sorting would be too slow" so I thought you deliberately removed it. My mistake, it's a bit late here. Btw I was thinking now on another solution instead of sorting the arrays, but I suppose it's not LINQ :) – Adrian Iftode Aug 04 '11 at 22:04
  • Thanks for the replies folks - as is the way in software development, I haven't been able to get back to working on my own code today as I have been bug-fixing on another piece of work with a higher business priority, so I am going to use the solution you outline over the weekend. – phil Aug 05 '11 at 17:51
  • @Adrian, sometimes, the best answer to “How to do X in LINQ?” is “Don't use LINQ.”. – svick Aug 05 '11 at 19:06
1

The finalised solution to the whole combining of multisets, then pruning the result-sets to remove duplicates problem ended up in a helper class as a static method. It takes svick's much appreciated answer and injects the IEqualityComparer dependency into the existing CartesianProduct answer I found at Eric Lipperts's blog here (I'd recommend reading his post as it explains the iterations in his thinking and why the linq implimentation is the best).

static IEnumerable<IEnumerable<T>> CartesianProduct<T>(IEnumerable<IEnumerable<T>> sequences,
                                                       IEqualityComparer<IEnumerable<T>> sequenceComparer)
{
    IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
    var resultsSet = sequences.Aggregate(emptyProduct, (accumulator, sequence) => from accseq in accumulator
                                                                                  from item in sequence
                                                                                  select accseq.Concat(new[] { item }));

    if (sequenceComparer != null)
        return resultsSet.Distinct(sequenceComparer);
    else
        return resultsSet;
}
phil
  • 879
  • 1
  • 10
  • 20
1
void Main()
{
    var query =     from a in new int[] { 1 }
                    from b in new int[] { 2, 3 }
                    from c in new int[] { 2, 3 }
                    from d in new int[] { 4 }                   
                    from e in new int[] { 2, 3 }
                    select new int[] { a, b, c, d, e }; 
    query.Distinct(new ArrayComparer());
        //.Dump();
}
 public class ArrayComparer : IEqualityComparer<int[]>
    {
        public bool Equals(int[] x, int[] y)
        {            
            if (x == null || y == null)
                return false;

            return x.OrderBy(i => i).SequenceEqual<int>(y.OrderBy(i => i));

        }

        public int GetHashCode(int[] obj)
        {
            if ( obj == null || obj.Length == 0)
                return 0;
            var hashcode = obj[0];
            for (int i = 1; i < obj.Length; i++)
            {
                hashcode ^= obj[i];
            }
            return hashcode;
        }
    }
Adrian Iftode
  • 15,465
  • 4
  • 48
  • 73
  • I was just digging up a similar equality comparer. It's quite similar, but I could not help pointing out that the actual comparison can be done by Linq's `SequenceEqual()` in stead of a `for` loop. – Gert Arnold Aug 04 '11 at 21:22
  • 1
    I don't think XOR is a good choice of hash code here. Of course you can never avoid collisions, but they shouldn't be as easy as `1,1`, `2,2`. Those two sequences shouldn't have the same hash code, because patterns like this could easily be in the input. – svick Aug 04 '11 at 21:31
  • Thanks for the replies folks - as is the way in software development, I haven't been able to get back to working on my own code today as I have been bug-fixing on another piece of work with a higher business priority, so I am going to use the solution you outline over the weekend, with a slightly more robust hash code as svick suggests. – phil Aug 05 '11 at 17:50
  • Implemented, tested, perfick! – phil Aug 05 '11 at 22:10
  • Thanks again guys...it took just a few minutes to put together the previously solved Cartesian Product solution and this pruning one...especially doing it in LinqPad (to all the uninitiated, who are looking to learn more about LINQ, you should definitely check out this great (free) tool at http://www.linqpad.net/) – phil Aug 05 '11 at 22:16