0

I am generating a list of partitions from a list of elements (akin to partitions of a set or set partitions). The problem is for each of these partitions I need to assigned a random number indicating their value so I can run some computations on later on the output data consisting of a partition = value pair.

A sample would be a csv with sample entries as below:

p,v   
"[[1, 2, 3, 4]]",0.3999960625186746
"[[1], [2, 3, 4]]",0.49159520559753156
"[[1, 2], [3, 4]]",0.12658202037597555
"[[1, 3, 4], [2]]",0.11670775560336522
"[[1], [2], [3, 4]]",0.006059031164368345

Here is the code I have put together for this:

from collections import defaultdict
import random
import csv

partitions = []

elements = input('Please specify number of elements: ')
size = int(elements)
fileheader = str(size)

# simple menu
if size  == 1:
    partitionlist = range(1,size+1)
    print ('A one element list have 1 partition')
elif size < 28:
    partitionlist = range(1,size+1)
elif size >= 28:
    partitionlist = [0]
    print ("Invalid number. Try again...")

# generate all partitions
def partition(elements):
    if len(elements) == 1:
        yield [ elements ]
        return

    first = elements[0]
    for smaller in partition(elements[1:]):
        # insert `first` in each of the subpartition's subsets
        for n, subset in enumerate(smaller):
            yield smaller[:n] + [[ first ] + subset]  + smaller[n+1:]
        # put `first` in its own subset 
        yield [ [ first ] ] + smaller

for p in partition(partitionlist):
    partitions.append([sorted(p)] + [random.uniform(0,1)])

# write the generated input to CSV file
data = partitions

def partition_value_data(size):
    with open( size+'-elem-normaldist.csv','w') as out:
        csv_out=csv.writer(out)
        csv_out.writerow(['p','v'])
        for row in data:
            csv_out.writerow(row)

partition_value_data(fileheader)

The problem I'm facing is that when the number of elements goes over 13, I get a memory error. Is it due to my computers memory or a limit within Python itself. I'm using Python 2.7.12.

for a list with 15 elements the number of partitions is approx. 1382958545

I'm trying to generate a partitions of a list of up to 30 elements where the number of partitions would be approx. 545717047947902329359

Any advice is really appreciated. thank you.

Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
  • It looks like you're using Alexis's partition code from [this answer](http://stackoverflow.com/a/30134039/4014959). It's a Good Idea to give the attribution of code written by others. – PM 2Ring Jul 14 '16 at 13:09
  • I assume you **don't** intend to generate _all_ the partitions of a set of 30 elements, that would take _quite_ a while. :) BTW, you can improve the speed of the `partition` function by converting it to use tuples instead of lists. In my timeit tests, the tuple version runs at around 60-70% of the time of the list version; the difference is greater in Python 2 than in Python 3. – PM 2Ring Jul 17 '16 at 08:49

1 Answers1

1

Your issue here is that you're combining a generator with turning it into a list, which totally negates any benefit from creating a generator.

Instead, you should just be writing out directly from your generator.

from collections import defaultdict
import random
import csv

elements = input('Please specify number of elements: ')
size = int(elements)
fileheader = str(size)

# simple menu
if size  == 1:
    partitionlist = range(1,size+1)
    print ('A one element list have 1 partition')
elif size < 28:
    partitionlist = range(1,size+1)
elif size >= 28:
    partitionlist = [0]
    print ("Invalid number. Try again...")

# generate all partitions
def partition(elements):
    if len(elements) == 1:
        yield [ elements ]
        return

    first = elements[0]
    for smaller in partition(elements[1:]):
        # insert `first` in each of the subpartition's subsets
        for n, subset in enumerate(smaller):
            yield smaller[:n] + [[ first ] + subset]  + smaller[n+1:]
        # put `first` in its own subset 
        yield [ [ first ] ] + smaller


def partition_value_data(size):
    with open( size+'-elem-normaldist.csv','w') as out:
        csv_out=csv.writer(out)
        csv_out.writerow(['p','v'])

        for row in partition(partitionlist):
            csv_out.writerow([sorted(row)] + [random.uniform(0,1)])

partition_value_data(fileheader)
Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
  • Hi Wayne, thank you for the heads up. Initially I put the generator output into a list so I was able to get the len() of the list that will let me know if I was generating the number of partitions. I just realise that I could do that now while writing direct to file by counting how many times an element is passed through csv_out.I'm giving this a try to see how it goes. Thank you for your help. – Valorian85 Jul 14 '16 at 13:00
  • Correct - I probably would add a `count += 1` inside the `for row in partition` (and of course set it to `0` before my loop). If this works for you, please remember to mark it as the accepted answer :) I would also recommend posting this code over at http://codereview.stackexchange.com/ - there are some other improvements I think you could make – Wayne Werner Jul 14 '16 at 14:25
  • To the left side under the up and down arrows there is a check mark. If you click the check mark it will mark this as the accepted answer and turn green. – Wayne Werner Jul 25 '16 at 13:28
  • 1
    @WayneWerner Instead of initializing and updating a counter manually one could use the `enumerate()` function to produce a number for each `row`. – BlackJack Aug 10 '16 at 16:06
  • @BlackJack definitely. Of course that's why I recommended a code review, because this code has many opportunities for improvement :) – Wayne Werner Aug 10 '16 at 16:12