1

I'm writing a small function for finding all the subsets of a list of numbers S, the output is a list of lists.

def subsets(S):
    if S is None or len(S) == 0:
        return [[]]

    output_list = []
    sub = [[[], [S[0]]]]
    for i in xrange(1, len(S)):
        without_ith = sub[i - 1]
        with_ith = [element + [S[i]] for element in without_ith]

        # convert to set of tuples, for set union
        without_set = set(tuple(element) for element in without_ith)
        with_set = set(tuple(element) for element in with_ith)
        new_set = without_set | with_set

        # convert back to list of lists
        new = list(list(element) for element in new_set)
        sub.append(new)

    # sort each sublist into non-descending order
    # output_list = [sorted(element) for element in sub[-1]]
    for element in sub[-1]:
        output_list.append(sorted(element))
    return output_list

The algorithm is described in the accepted answer of this post: Finding all the subsets of a set

The thing that annoys me is the conversion from list of lists to set of tuples, and then perform the union of two sets of tuples, and convert back to list of lists. All these happens in every iteration. The reason is that in Python, sets must contain immutable objects, which are hashable, in order to perform set operations with other sets. But lists and sets are mutable and unhashable, tuples or frozensets are required as element of such sets. For my code, I mutate the element lists first and convert them to tuples for the union, and then convert back to lists. I wonder if there is a work-around? It looks not very clean and efficient.

(And a minor doubt is the list comprehension I commented out # output_list = [sorted(element) for element in sub[-1]]. I'm using PyCharm and it suggests replacing the list comprehension with a for loop. Any reason? I thought list comprehension is always better.)

Community
  • 1
  • 1
Logan Yang
  • 2,364
  • 6
  • 27
  • 43
  • @Logan_Yang This does not answer your question, but in that particular case I would implement it using an iterator via yield which is an "on-demand" approach. That would also remove the last part as you won't need to append items to the output list. – avenet Jan 03 '15 at 01:47
  • 1
    Can't you make `sub` a list of tuples instead of a list of lists? – BrenBarn Jan 03 '15 at 01:50
  • 1
    Why not using itertools for that? Here is one way how to implement it efficiently: http://stackoverflow.com/questions/7988695/getting-the-subsets-of-a-set-in-python –  Jan 03 '15 at 01:51
  • @BrenBarn Yes I can, but I have to convert the tuple back to list in the step `with_ith = [element + [S[i]] for element in without_ith]` because I need to mutate it. – Logan Yang Jan 03 '15 at 01:53
  • @SebastianRaschka Yes indeed, I know there is itertools and it's great but I just tried to implement this algorithm to practice, then using itertools lost the point of practicing. Thanks anyway! ;) – Logan Yang Jan 03 '15 at 01:56
  • @LoganYang: Where do you mutate it? It looks like the only things you mutate (by appending) are`sub` itself and `output_list`, not the other things inside `sub`. – BrenBarn Jan 03 '15 at 01:59
  • @BrenBarn OH yes you are right. I didn't mutate it, what I did was making a copy of it and adding a new element, then performing union on the copies. They are not mutated. Thanks for pointing that out. It can be a little cleaner now. – Logan Yang Jan 03 '15 at 02:04

3 Answers3

1

I like a "counting" approach to such tasks as "returning all subsets". Assuming S is a list of numbers without duplicates:

def subsets(S):   # S is a list of `whatever`
    result = []
    S = sorted(S)  # iff S can't be assumed to be sorted to start
    # S = sorted(set(S)) if duplicates are possible and must be pruned
    for i in range(2**len(S)):
        asubset = []
        for j, x in enumerate(S):
            if i & 1<<j: asubset.append(x)
        result.append(asubset)
    return result

Essentially, this exploits the 1-1 correspondence between subsets of N things and the binary forms of integers from 0 to 2**N - 1.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • It's a great method! But I chose to accept the other answer because it's more relevant to my code. Thanks anyway for showing me the great thinking! – Logan Yang Jan 04 '15 at 05:39
1

There are no duplicated items between your without_ith and with_ith lists, since the lists in the former never contain S[i] and those in the latter always do. This means that there is no need to use set objects when you combine them, just concatenate one list onto the other and you'll be good! Or, you could use a single list variable and extend it with a list comprehension:

def subsets(S):
    results = [[]]
    for x in S:
        results.extend([item + [x] for item in results])
    return results

If your input list is sorted, all of the subsets will be too. If the input is not always going to be in order and you need the output to be, loop on sorted(S) instead of S directly. The items in the subsets will always be appear in the same order they are iterated over.

Note that it is important that you use a list comprehension in the extend call, rather than a generator expression. A generator would continue iterating over the newly added items, resulting in an infinite loop (until your system runs out of memory to expand the list).

Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • Thank you, your answer is the most helpful! You directly pointed out my mistakes both in thinking and in writing Python, and showed me an elegant answer. I read the description of the algorithm in that old post and didn't carefully think it through when I was writing it. Bravo and Thank you for your detailed explanation! – Logan Yang Jan 04 '15 at 05:37
0

Looks like list comprehension is faster than appending items because using it does not need to load the list append function into memory. Check this great article on a deep list comprehension vs append comparison.

So, for your particular problem, I guess list comprehension is faster.

avenet
  • 2,894
  • 1
  • 19
  • 26
  • 1
    Your link is a link to the documentation for `itertools.combinations`. Did you mean to post a different link? – BrenBarn Jan 03 '15 at 02:08