3

Consider this List<string>

List<string> data = new List<string>();
data.Add("Text1");
data.Add("Text2");
data.Add("Text3");
data.Add("Text4");

The problem I had was: how can I get every combination of a subset of the list? Kinda like this:

#Subset Dimension 4
Text1;Text2;Text3;Text4

#Subset Dimension 3
Text1;Text2;Text3;
Text1;Text2;Text4;
Text1;Text3;Text4;
Text2;Text3;Text4;

#Subset Dimension 2
Text1;Text2;
Text1;Text3;
Text1;Text4;
Text2;Text3;
Text2;Text4;

#Subset Dimension 1
Text1;
Text2;
Text3;
Text4;

I came up with a decent solution which a think is worth to share here.

Abaco
  • 525
  • 7
  • 26

4 Answers4

4

Similar logic as Abaco's answer, different implementation....

foreach (var ss in data.SubSets_LB())
{
    Console.WriteLine(String.Join("; ",ss));
}

public static class SO_EXTENSIONS
{
    public static IEnumerable<IEnumerable<T>> SubSets_LB<T>(
      this IEnumerable<T> enumerable)
    {
        List<T> list = enumerable.ToList();
        ulong upper = (ulong)1 << list.Count;

        for (ulong i = 0; i < upper; i++)
        {
            List<T> l = new List<T>(list.Count);
            for (int j = 0; j < sizeof(ulong) * 8; j++)
            {
                if (((ulong)1 << j) >= upper) break;

                if (((i >> j) & 1) == 1)
                {
                    l.Add(list[j]);
                }
            }

            yield return l;
        }
    }
}
L.B
  • 114,136
  • 19
  • 178
  • 224
3

I think, the answers in this question need some performance tests. I'll give it a go. It is community wiki, feel free to update it.

void PerfTest()
{
    var list = Enumerable.Range(0, 21).ToList();

    var t1 = GetDurationInMs(list.SubSets_LB);
    var t2 = GetDurationInMs(list.SubSets_Jodrell2);
    var t3 = GetDurationInMs(() => list.CalcCombinations(20));

    Console.WriteLine("{0}\n{1}\n{2}", t1, t2, t3);
}

long GetDurationInMs(Func<IEnumerable<IEnumerable<int>>> fxn)
{
    fxn(); //JIT???
    var count = 0;

    var sw = Stopwatch.StartNew();
    foreach (var ss in fxn())
    {
        count = ss.Sum();
    }
    return sw.ElapsedMilliseconds;
}

OUTPUT:

1281
1604 (_Jodrell not _Jodrell2)
6817

Jodrell's Update

I've built in release mode, i.e. optimizations on. When I run via Visual Studio I don't get a consistent bias between 1 or 2, but after repeated runs LB's answer wins, I get answers approaching something like,

1190
1260
more

but if I run the test harness from the command line, not via Visual Studio, I get results more like this

987
879
still more
Jodrell
  • 34,946
  • 5
  • 87
  • 124
L.B
  • 114,136
  • 19
  • 178
  • 224
  • Upvoted all previous posts. Thanks for your precious contribution. Sure I have some rework to do :) – Abaco Dec 10 '12 at 10:55
  • 1
    @Abaco, As stated in my extended answer, I amalgamated to produce (in my testing) the best performance yet. http://stackoverflow.comhttp://stackoverflow.com/a/13768100/659190 – Jodrell Dec 10 '12 at 12:21
  • Considering the usefulness of all the answers I decided to accept the wiki as a sort of summary. – Abaco Dec 13 '12 at 09:20
2

EDIT

I've accepted the performance gauntlet, what follows is my amalgamation that takes the best of all answers. In my testing, it seems to have the best performance yet.

public static IEnumerable<IEnumerable<T>> SubSets_Jodrell2<T>(
    this IEnumerable<T> source)
{
    var list = source.ToList();
    var limit = (ulong)(1 << list.Count);

    for (var i = limit; i > 0; i--)
    {
        yield return list.SubSet(i);
    }
}

private static IEnumerable<T> SubSet<T>(
    this IList<T> source, ulong bits)
{
    for (var i = 0; i < source.Count; i++)
    {
        if (((bits >> i) & 1) == 1)
        {
            yield return source[i];
        }
    }
}

Same idea again, almost the same as L.B's answer but my own interpretation.

I avoid the use of an internal List and Math.Pow.

public static IEnumerable<IEnumerable<T>> SubSets_Jodrell(
    this IEnumerable<T> source)
{
    var count = source.Count();

    if (count > 64)
    {
        throw new OverflowException("Not Supported ...");
    }

    var limit = (ulong)(1 << count) - 2;

    for (var i = limit; i > 0; i--)
    {
        yield return source.SubSet(i);
    }
}

private static IEnumerable<T> SubSet<T>(
    this IEnumerable<T> source,
    ulong bits)
{
    var check = (ulong)1;
    foreach (var t in source)
    {
        if ((bits & check) > 0)
        {
            yield return t;
        }

        check <<= 1;
    }
}

You'll note that these methods don't work with more than 64 elements in the intial set but it starts to take a while then anyhow.

Community
  • 1
  • 1
Jodrell
  • 34,946
  • 5
  • 87
  • 124
  • Jodrell, nice piece of code. But, In terms of performance, my test results say different(I used the code `PerfTest` below (or above:) ). – L.B Dec 10 '12 at 12:51
  • @L.B, my test code must have been wrong, I've retested and amended the wiki. – Jodrell Dec 10 '12 at 15:05
1

I developed a simple ExtensionMethod for lists:

    /// <summary>
    /// Obtain all the combinations of the elements contained in a list
    /// </summary>
    /// <param name="subsetDimension">Subset Dimension</param>
    /// <returns>IEnumerable containing all the differents subsets</returns>
    public static IEnumerable<List<T>> CalcCombinations<T>(this List<T> list, int subsetDimension)
    {
        //First of all we will create a binary matrix. The dimension of a single row
        //must be the dimension of list 
        //on which we are working (we need a 0 or a 1 for every single element) so row
        //dimension is to obtain a row-length = list.count we have to
        //populate the matrix with the first 2^list.Count binary numbers
        int rowDimension = Convert.ToInt32(Math.Pow(2, list.Count));

        //Now we start counting! We will fill our matrix with every number from 1 
        //(0 is meaningless) to rowDimension
        //we are creating binary mask, hence the name
        List<int[]> combinationMasks = new List<int[]>();
        for (int i = 1; i < rowDimension; i++)
        {
            //I'll grab the binary rapresentation of the number
            string binaryString = Convert.ToString(i, 2);

            //I'll initialize an array of the apropriate dimension
            int[] mask = new int[list.Count];

            //Now, we have to convert our string in a array of 0 and 1, so first we 
            //obtain an array of int then we have to copy it inside our mask 
            //(which have the appropriate dimension), the Reverse()
            //is used because of the behaviour of CopyTo()
            binaryString.Select(x => x == '0' ? 0 : 1).Reverse().ToArray().CopyTo(mask, 0);

            //Why should we keep masks of a dimension which isn't the one of the subset?
            // We have to filter it then!
            if (mask.Sum() == subsetDimension) combinationMasks.Add(mask);
        }

        //And now we apply the matrix to our list
        foreach (int[] mask in combinationMasks)
        {
            List<T> temporaryList = new List<T>(list);

            //Executes the cycle in reverse order to avoid index out of bound
            for (int iter = mask.Length - 1; iter >= 0; iter--)
            {
                //Whenever a 0 is found the correspondent item is removed from the list
                if (mask[iter] == 0)
                    temporaryList.RemoveAt(iter);
            }
            yield return temporaryList;
        }
    }
}

So considering the example in the question:

# Row Dimension of 4 (list.Count)
Binary Numbers to 2^4

# Binary Matrix
0 0 0 1 => skip
0 0 1 0 => skip
[...]
0 1 1 1 => added // Text2;Text3;Text4
[...]
1 0 1 1 => added // Text1;Text3;Text4
1 1 0 0 => skip
1 1 0 1 => added // Text1;Text2;Text4
1 1 1 0 => added // Text1;Text2;Text3
1 1 1 1 => skip

Hope this can help someone :)

If you need clarification or you want to contribute feel free to add answers or comments (which one is more appropriate).

Abaco
  • 525
  • 7
  • 26