Efficient way to generate partitions with value pair of a list >15 elements

Question

I am generating a list of partitions from a list of elements (akin to partitions of a set or set partitions). The problem is for each of these partitions I need to assigned a random number indicating their value so I can run some computations on later on the output data consisting of a partition = value pair.

A sample would be a csv with sample entries as below:

p,v   
"[[1, 2, 3, 4]]",0.3999960625186746
"[[1], [2, 3, 4]]",0.49159520559753156
"[[1, 2], [3, 4]]",0.12658202037597555
"[[1, 3, 4], [2]]",0.11670775560336522
"[[1], [2], [3, 4]]",0.006059031164368345

Here is the code I have put together for this:

from collections import defaultdict
import random
import csv

partitions = []

elements = input('Please specify number of elements: ')
size = int(elements)
fileheader = str(size)

# simple menu
if size  == 1:
    partitionlist = range(1,size+1)
    print ('A one element list have 1 partition')
elif size < 28:
    partitionlist = range(1,size+1)
elif size >= 28:
    partitionlist = [0]
    print ("Invalid number. Try again...")

# generate all partitions
def partition(elements):
    if len(elements) == 1:
        yield [ elements ]
        return

    first = elements[0]
    for smaller in partition(elements[1:]):
        # insert `first` in each of the subpartition's subsets
        for n, subset in enumerate(smaller):
            yield smaller[:n] + [[ first ] + subset]  + smaller[n+1:]
        # put `first` in its own subset 
        yield [ [ first ] ] + smaller

for p in partition(partitionlist):
    partitions.append([sorted(p)] + [random.uniform(0,1)])

# write the generated input to CSV file
data = partitions

def partition_value_data(size):
    with open( size+'-elem-normaldist.csv','w') as out:
        csv_out=csv.writer(out)
        csv_out.writerow(['p','v'])
        for row in data:
            csv_out.writerow(row)

partition_value_data(fileheader)

The problem I'm facing is that when the number of elements goes over 13, I get a memory error. Is it due to my computers memory or a limit within Python itself. I'm using Python 2.7.12.

for a list with 15 elements the number of partitions is approx. 1382958545

I'm trying to generate a partitions of a list of up to 30 elements where the number of partitions would be approx. 545717047947902329359

Any advice is really appreciated. thank you.

It looks like you're using Alexis's partition code from [this answer](http://stackoverflow.com/a/30134039/4014959). It's a Good Idea to give the attribution of code written by others. — PM 2Ring, Jul 14 '16 at 13:09
I assume you **don't** intend to generate _all_ the partitions of a set of 30 elements, that would take _quite_ a while. :) BTW, you can improve the speed of the `partition` function by converting it to use tuples instead of lists. In my timeit tests, the tuple version runs at around 60-70% of the time of the list version; the difference is greater in Python 2 than in Python 3. — PM 2Ring, Jul 17 '16 at 08:49

score 1 · Accepted Answer · answered Jul 14 '16 at 12:43

1

Your issue here is that you're combining a generator with turning it into a list, which totally negates any benefit from creating a generator.

Instead, you should just be writing out directly from your generator.

from collections import defaultdict
import random
import csv

elements = input('Please specify number of elements: ')
size = int(elements)
fileheader = str(size)

# simple menu
if size  == 1:
    partitionlist = range(1,size+1)
    print ('A one element list have 1 partition')
elif size < 28:
    partitionlist = range(1,size+1)
elif size >= 28:
    partitionlist = [0]
    print ("Invalid number. Try again...")

# generate all partitions
def partition(elements):
    if len(elements) == 1:
        yield [ elements ]
        return

    first = elements[0]
    for smaller in partition(elements[1:]):
        # insert `first` in each of the subpartition's subsets
        for n, subset in enumerate(smaller):
            yield smaller[:n] + [[ first ] + subset]  + smaller[n+1:]
        # put `first` in its own subset 
        yield [ [ first ] ] + smaller


def partition_value_data(size):
    with open( size+'-elem-normaldist.csv','w') as out:
        csv_out=csv.writer(out)
        csv_out.writerow(['p','v'])

        for row in partition(partitionlist):
            csv_out.writerow([sorted(row)] + [random.uniform(0,1)])

partition_value_data(fileheader)

answered Jul 14 '16 at 12:43

Wayne Werner

49,299
29
200
290

Hi Wayne, thank you for the heads up. Initially I put the generator output into a list so I was able to get the len() of the list that will let me know if I was generating the number of partitions. I just realise that I could do that now while writing direct to file by counting how many times an element is passed through csv_out.I'm giving this a try to see how it goes. Thank you for your help. – Valorian85 Jul 14 '16 at 13:00
Correct - I probably would add a `count += 1` inside the `for row in partition` (and of course set it to `0` before my loop). If this works for you, please remember to mark it as the accepted answer :) I would also recommend posting this code over at http://codereview.stackexchange.com/ - there are some other improvements I think you could make – Wayne Werner Jul 14 '16 at 14:25
To the left side under the up and down arrows there is a check mark. If you click the check mark it will mark this as the accepted answer and turn green. – Wayne Werner Jul 25 '16 at 13:28
1

@WayneWerner Instead of initializing and updating a counter manually one could use the `enumerate()` function to produce a number for each `row`. – BlackJack Aug 10 '16 at 16:06
@BlackJack definitely. Of course that's why I recommended a code review, because this code has many opportunities for improvement :) – Wayne Werner Aug 10 '16 at 16:12

Efficient way to generate partitions with value pair of a list >15 elements

1 Answers1