6

I've been using the random_element() function provided by SAGE to generate random integer partitions for a given integer (N) that are a particular length (S). I'm trying to generate unbiased random samples from the set of all partitions for given values of N and S. SAGE's function quickly returns random partitions for N (i.e. Partitions(N).random_element()).

However, it slows immensely when adding S (i.e. Partitions(N,length=S).random_element()). Likewise, filtering out random partitions of N that are of length S is incredibly slow.

However, and I hope this helps someone, I've found that in the case when the function returns a partition of N not matching the length S, that the conjugate partition is often of length S. That is:

S = 10
N = 100
part = list(Partitions(N).random_element())
    if len(part) != S:
        SAD = list(Partition(part).conjugate())
        if len(SAD) != S:
            continue

This increases the rate at which partitions of length S are found and appears to produce unbiased samples (I've examined the results against entire sets of partitions for various values of N and S).

However, I'm using values of N (e.g. 10,000) and S (e.g. 300) that make even this approach impractically slow. The comment associated with SAGE's random_element() function admits there is plenty of room for optimization. So, is there a way to more quickly generate unbiased (i.e. random uniform) samples of integer partitions matching given values of N and S, perhaps, by not generating partitions that do not match S? Additionally, using conjugate partitions works well in many cases to produce unbiased samples, but I can't say that I precisely understand why.

Zsolt Botykai
  • 50,406
  • 14
  • 85
  • 110
klocey
  • 314
  • 5
  • 12

3 Answers3

7

Finally, I have a definitively unbiased method that has a zero rejection rate. Of course, I've tested it to make sure the results are representative samples of entire feasible sets. It's very fast and totally unbiased. Enjoy.

from sage.all import *
import random

First, a function to find the smallest maximum addend for a partition of n with s parts

def min_max(n,s):

    _min = int(floor(float(n)/float(s)))
    if int(n%s) > 0:
        _min +=1

    return _min

Next, A function that uses a cache and memoiziation to find the number of partitions of n with s parts having x as the largest part. This is fast, but I think there's a more elegant solution to be had. e.g., Often: P(N,S,max=K) = P(N-K,S-1) Thanks to ante (https://stackoverflow.com/users/494076/ante) for helping me with this: Finding the number of integer partitions given a total, a number of parts, and a maximum summand

D = {}
def P(n,s,x):
    if n > s*x or x <= 0: return 0
    if n == s*x: return 1
    if (n,s,x) not in D:
        D[(n,s,x)] = sum(P(n-i*x, s-i, x-1) for i in xrange(s))
    return D[(n,s,x)]

Finally, a function to find uniform random partitions of n with s parts, with no rejection rate! Each randomly chosen number codes for a specific partition of n having s parts.

def random_partition(n,s):
    S = s
    partition = []
    _min = min_max(n,S)
    _max = n-S+1

    total = number_of_partitions(n,S)
    which = random.randrange(1,total+1) # random number

    while n:
        for k in range(_min,_max+1):
            count = P(n,S,k)
            if count >= which:
                count = P(n,S,k-1)
                break

        partition.append(k)
        n -= k
        if n == 0: break
        S -= 1
        which -= count
        _min = min_max(n,S)
        _max = k

    return partition
Community
  • 1
  • 1
klocey
  • 314
  • 5
  • 12
0

Simple approach: randomly assign the integers:

def random_partition(n, s):
    partition = [0] * s
    for x in range(n):
        partition[random.randrange(s)] += 1
    return partition
Winston Ewert
  • 44,070
  • 10
  • 68
  • 83
  • Thanks for the response, but I don't see how this function yields partitions based on uniform random sampling. – klocey Apr 24 '12 at 01:20
  • @klocey, I missed the fact that you are generating random elements from the sequence, sorry. – Winston Ewert Apr 24 '12 at 01:40
  • I implemented this function and compared random samples generated by it to full sets of partitions for several combinations of N and S. Comparisons were made using kernel density curves generated from variances of partitions. Like every other sampling strategy I've tried, this function yields biased samples (partitions of lower than expected variance). Apparently, it is just really difficult to generate an unbiased random sample from the set of all partitions for a given total N and length S. The SAGE function is the closest I've come, but it's far from optimal. – klocey Apr 24 '12 at 06:38
0

I ran into a similar problem when I was trying to calculate the probability of the strong birthday problem.

First off, the partition function explodes when given only modest amount of numbers. You'll be returning a LOT of information. No matter which method you're using N = 10000 and S = 300 will generate ridiculous amounts of data. It will be slow. Chances are any pure python implementation you use will be equally slow or slower. Look to making a CModule.

If you want to try python the approach I took as a combination of itertools and generators to keep memory usage down. I don't seem to have my code handy anymore, but here's a good impementation:

http://wordaligned.org/articles/partitioning-with-python

EDIT:

Found my code:

def partition(a, b=-1, limit=365):
  if (b == -1):
    b = a
  if (a == 2 or a == 3):
    if (b >= a and limit):
      yield [a]
    else:
      return
  elif (a > 3):
    if (a <= b):
      yield [a]
    c = 0
    if b > a-2:
      c = a-2
    else:
      c = b
    for i in xrange(c, 1, -1):
      if (limit):
        for j in partition(a-i, i, limit-1):
          yield [i] + j
OmnipotentEntity
  • 16,531
  • 6
  • 62
  • 96
  • Yes, combinatorial explosion is toughie. However, I generate random partitions one at a time and only keep a small random sample for comparative analysis. I'm trying to obtain a small unbiased random sample of partitions for a given total N of a given length S. SAGE's functions run in Cython, so do my own scripts, so efficient speed isn't as much a problem as finding an algorithm or a way to tweak SAGE's function that avoids generating unnecessary partitions (i.e. those not of length S). I'll take a look at your implementation and the 'strong birthday problem'. Thanks. – klocey Apr 24 '12 at 01:32
  • Found my code, it's a generator, and it find partitions that are size 2 or larger up to a max of a given number, you can remove the logic that prevents partitions smaller than two. But I doubt that it will be much faster. – OmnipotentEntity Apr 24 '12 at 02:05