0

Looking for an efficient way to determine all unique subsets of a list of integers.

Say I have a List of integers containing 340 integers. I want a List of all possible subsets (of 5 elements each). All supplied integers will be unique, and the result should not duplicate any element in it's subset. An example given an input of 1,2,3,4,5,6,7,8,9 I am looking for an output of

  • 1,2,3,4,5
  • 1,2,3,4,6
  • 1,2,3,4,7
  • 1,2,3,4,8
  • 1,2,3,4,9
  • 1,2,3,5,6
  • 1,2,3,5,7
  • ...

I must do this in CSharp. can this be done in LINQ?

Walter Zydhek
  • 293
  • 4
  • 10
  • 1
    You sure you want **list** - we are talking about 2,052,469,935,880 combinations, you can't store so many elements in a list. – Ivan Stoev Jul 17 '16 at 18:47
  • That would be a worse case scenario. Normally I would only be working with about 25-30 integers. And I CAN store that many combinations in a list, as I am working in 64-bit. – Walter Zydhek Jul 17 '16 at 18:50
  • See also e.g. https://stackoverflow.com/questions/5023081/find-all-subsets-of-a-list and https://stackoverflow.com/questions/29089992/how-can-i-get-all-subsets-of-a-set-that-respect-the-order (not necessarily exact duplicates, but surely contain enough information for you to make some forward progress on your own, rather than just asking someone else to do all the work for you). – Peter Duniho Jul 17 '16 at 20:59
  • @IvanStoev Well, the [combinations](http://www.calculatorsoup.com/calculators/discretemathematics/combinations.php) would be 340! / (5! (340 - 5)!) that are 36,760,655,568 of int but you'll right in saying that they won't fit (let's say 64 bits) the physical memory [limit](https://msdn.microsoft.com/en-us/library/windows/desktop/aa366778(v=vs.85).aspx) of a Windows Server 2012 Datacenter LOL –  Jul 23 '16 at 08:52
  • @PeterDuniho You could have said it's a dupicate of this nice community [wiki](http://stackoverflow.com/a/1898744) but let me just add that here the difficult part is to find an efficient (fast) impementation, meaning porting something from Linq to Array mgmt... –  Jul 23 '16 at 08:55

2 Answers2

2

I have answered a several combinatorial questions, and everywhere I use a variation of a non recursive allocation free algorithm. For this case, it looks like this:

public static class Algorithms
{
    public static IEnumerable<T[]> GetCombinations<T>(this T[] input, int N)
    {
        var result = new T[N];
        var indices = new int[N];
        for (int pos = 0, index = 0; ;)
        {
            for (; pos < N; pos++, index++)
            {
                indices[pos] = index;
                result[pos] = input[index];
            }
            yield return result;
            do
            {
                if (pos == 0) yield break;
                index = indices[--pos] + 1;
            }
            while (index > input.Length - N + pos);
        }
    }
}

As in the other implementations, the method yields one and the same internal buffer, which is useful when you need just to iterate and process the resulting set once. If you need to store the combinations, then you need to clone the returned array before storing it. Here is a sample usage like in your scenario:

var input = Enumerable.Range(1, 20);
var result = input
    .Distinct()
    .ToArray()
    .GetCombinations(5)
    .Select(c => (int[])c.Clone())
    .ToList();

UPDATE: The GetCombinations method basically emulates N nested loops like this (in pseudo code):

for (int i0 = 0; i0 <= input.Length - N; i0++)
for (int i1 = i0 + 1; i1 <= input.Length - N + 1; i1++)
for (int i2 = i1 + 1; i2 <= input.Length - N + 2; i2++)
...
for (int iN-1 = iN-2 + 1; iN-1 <= input.Length - 1; iN-1++)
yield { input[i0], input[i1], input[i2], ..., input[iN-1] }

Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
  • Can you add some explanation of what your code is actually doing? I'm finding it hard to follow, probably not least because of those rather strange for loops! :) – Chris Jul 17 '16 at 22:36
  • 1
    lol :) All they do is simulating nested loops like `for (int i0 = 0; i0 <= M - N; i0++) for (int i1 = i0 + 1; i1 <= M - N + 1; i1++) for (int i2 = i1 + 1; i2 <= M - N + 2; i2++)...` up to N - 1. – Ivan Stoev Jul 18 '16 at 05:17
0

In case of a tractable set of 9 elements (or max 25-30) and subset of 5, the code could be based on a recursive function

   static void Main(string[] args)
    {
        foreach (var item in ListPerm())
        {
            Console.WriteLine(String.Join(",", item));
        }
        Console.Read();
    }


    private static List<List<int>> ListPerm(HashSet<int> mySet = null, int deep = 5)
    {
        if (mySet == null)
        {
            mySet = initSet(8);
        }
        if (deep <= 0)
        {
            return Enumerable.Empty<List<int>>().ToList();
        }
        List<List<int>> all = new List<List<int>>();
        for (int i = 0; i < mySet.Count - deep + 1; i++)
        {

            if (deep == 1)
            {
                var list = new List<int>() { mySet.ElementAt(i) };
                all.Add(list);
            }
            foreach (var item in ListPerm(new HashSet<int>(mySet.Skip(i+1)), deep - 1))
            {
                var list = new List<int>() { mySet.ElementAt(i) };
                list.AddRange(item);
                all.Add(list);
            }
        }
        return all;
    }

    private static HashSet<int> initSet(int lenght)
    {
        HashSet<int> ret = new HashSet<int>();
        for (int i = 0; i < lenght; i++)
        {
            ret.Add(i * 1  +  1); // just an example...
        };
        return ret;
    }

Reengineering

Now, let me optimize the above code into a more performant function, that takes 3.2 seconds to get the combinations of 8 integers out of 30 on my standard laptop.

private static int[][] ListPerm(int[] mySet, int deep)
{
    var all = new List<int[]>();
    if (deep == 1)
    {
        return mySet.Select(x => new int[] { x }).ToArray(); 
    }
    else
    {
        var mySubSet = new int[mySet.Length - 1];
        Array.Copy(mySet, 1, mySubSet, 0, mySubSet.Length);
        var perm1st = ListPerm(mySubSet, deep - 1);
        for (int i = 0; i < mySet.Length - deep + 1; i++)
        {
            var permn = perm1st.Select(x =>
                {
                    var z = new int[x.Length + 1];
                    z[0] = mySet[i];
                    x.CopyTo(z, 1);
                    return z;
                }
            );
            all.AddRange(permn);
            int start = Array.FindIndex(perm1st, item => item[0] != mySet[i + 1]);
                if (start > 0)
                {
                    var temp_cpy = new int[perm1st.Length - start][];
                    Array.Copy(perm1st, start, temp_cpy, 0, temp_cpy.Length);
                    perm1st = temp_cpy;
                }
        }
    }
    return all.ToArray();
}

Benchmark

Here it is a comparison of Ivan's, my and the community wiki algorithms for the combinations of 5 ints in 20.

Results

wiki perm: 00:00:00.0950055

writing wiki perm: 00:00:00.0460026

Ivan perm: 00:00:00.0400023

writing Ivan perm: 00:00:00.0260015

my perm: 00:00:00.0110006

writing my perm: 00:00:00.0300017

Test Code

    var input = Enumerable.Range(1, 20);
    int deep = 5;
    var start = DateTime.Now;


    var wiki =  Algorithms.Combinations(input, deep).ToArray();
    Console.WriteLine("wiki perm: {0}", DateTime.Now - start);
    start = DateTime.Now;
    StreamWriter sw0 = new StreamWriter(@"C:\dev\SO\Algo\perm0.txt", false);
    foreach (var item in wiki)
    {
        sw0.WriteLine(String.Join(",", item));
    }
    sw0.Close();
    Console.WriteLine("writing wiki perm: {0}", DateTime.Now - start);
    start = DateTime.Now;

    start = DateTime.Now;
    var result = input
        .Distinct()
        .ToArray()
        .GetCombinations(deep)
        .Select(c => (int[])c.Clone())
        .ToList();
    Console.WriteLine("Ivan perm: {0}", DateTime.Now - start);
    start = DateTime.Now;
    StreamWriter sw1 = new StreamWriter(@"C:\dev\SO\Algo\perm1.txt", false);
    foreach (var item in result)
    {
        sw1.WriteLine(String.Join(",", item));
    }
    sw1.Close();
    Console.WriteLine("writing Ivan perm: {0}", DateTime.Now - start);
    start = DateTime.Now;


    var myPerm = ListPermSO(input.ToArray(), deep);
    Console.WriteLine("my perm: {0}", DateTime.Now - start);
    start = DateTime.Now;
    StreamWriter sw2 = new StreamWriter(@"C:\dev\SO\Algo\perm2.txt", false);
    foreach (var item in myPerm)
    {
        sw2.WriteLine(String.Join(",", item));
    }
    sw2.Close();
    Console.WriteLine("writing my perm: {0}", DateTime.Now - start);
    Console.Read();
Community
  • 1
  • 1
  • After the reengineering, I think this recursive function is now slightly faster than Ivan Stoev's solution. –  Jul 22 '16 at 23:30