28

Is there an efficient way to generate a random combination of N integers such that—

  • each integer is in the interval [min, max],
  • the integers have a sum of sum,
  • the integers can appear in any order (e.g., random order), and
  • the combination is chosen uniformly at random from among all combinations that meet the other requirements?

Is there a similar algorithm for random combinations in which the integers must appear in sorted order by their values (rather than in any order)?

(Choosing an appropriate combination with a mean of mean is a special case, if sum = N * mean. This problem is equivalent to generating a uniform random partition of sum into N parts that are each in the interval [min, max] and appear in any order or in sorted order by their values, as the case may be.)

I am aware that this problem can be solved in the following way for combinations that appear in random order (EDIT [Apr. 27]: Algorithm modified.):

  1. If N * max < sum or N * min > sum, there is no solution.

  2. If N * max == sum, there is only one solution, in which all N numbers are equal to max. If N * min == sum, there is only one solution, in which all N numbers are equal to min.

  3. Use the algorithm given in Smith and Tromble ("Sampling from the Unit Simplex", 2004) to generate N random non-negative integers with the sum sum - N * min.

  4. Add min to each number generated this way.

  5. If any number is greater than max, go to step 3.

However, this algorithm is slow if max is much less than sum. For example, according to my tests (with an implementation of the special case above involving mean), the algorithm rejects, on average—

  • about 1.6 samples if N = 7, min = 3, max = 10, sum = 42, but
  • about 30.6 samples if N = 20, min = 3, max = 10, sum = 120.

Is there a way to modify this algorithm to be efficient for large N while still meeting the requirements above?

EDIT:

As an alternative suggested in the comments, an efficient way of producing a valid random combination (that satisfies all but the last requirement) is:

  1. Calculate X, the number of valid combinations possible given sum, min, and max.
  2. Choose Y, a uniform random integer in [0, X).
  3. Convert ("unrank") Y to a valid combination.

However, is there a formula for calculating the number of valid combinations (or permutations), and is there a way to convert an integer to a valid combination? [EDIT (Apr. 28): Same for permutations rather than combinations].

EDIT (Apr. 27):

After reading Devroye's Non-Uniform Random Variate Generation (1986), I can confirm that this is a problem of generating a random partition. Also, Exercise 2 (especially part E) on page 661 is relevant to this question.

EDIT (Apr. 28):

As it turned out the algorithm I gave is uniform where the integers involved are given in random order, as opposed to sorted order by their values. Since both problems are of general interest, I have modified this question to seek a canonical answer for both problems.

The following Ruby code can be used to verify potential solutions for uniformity (where algorithm(...) is the candidate algorithm):

combos={}
permus={}
mn=0
mx=6
sum=12
for x in mn..mx
  for y in mn..mx
    for z in mn..mx
      if x+y+z==sum
        permus[[x,y,z]]=0
      end
      if x+y+z==sum and x<=y and y<=z
        combos[[x,y,z]]=0
      end
    end
  end
end

3000.times {|x|
 f=algorithm(3,sum,mn,mx)
 combos[f.sort]+=1
 permus[f]+=1
}
p combos
p permus

EDIT (Apr. 29): Re-added Ruby code of current implementation.

The following code example is given in Ruby, but my question is independent of programming language:

def posintwithsum(n, total)
    raise if n <= 0 or total <=0
    ls = [0]
    ret = []
    while ls.length < n
      c = 1+rand(total-1)
      found = false
      for j in 1...ls.length
        if ls[j] == c
          found = true
          break
        end
      end
      if found == false;ls.push(c);end
    end
    ls.sort!
    ls.push(total)
    for i in 1...ls.length
       ret.push(ls[i] - ls[i - 1])
    end
    return ret
end

def integersWithSum(n, total)
 raise if n <= 0 or total <=0
 ret = posintwithsum(n, total + n)
 for i in 0...ret.length
    ret[i] = ret[i] - 1
 end
 return ret
end

# Generate 100 valid samples
mn=3
mx=10
sum=42
n=7
100.times {
 while true
    pp=integersWithSum(n,sum-n*mn).map{|x| x+mn }
    if !pp.find{|x| x>mx }
      p pp; break # Output the sample and break
    end
 end
}

Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • Could you clarify your third requirement? Do you need a uniformity among _all possible_ combinations (including those with the wrong mean), or among _all valid_ combinations (i.e. those with the correct mean)? – user58697 Apr 23 '20 at 23:29
  • All valid combinations, that is, all combinations that meet the other requirements. – Peter O. Apr 23 '20 at 23:36
  • If we had a way to count and unrank partitions of a sum restricted to N integers in [min, max], would choosing one of those partitions at random and unranking represent a uniform distribution, and would that be more efficient than your current method? How large can the sum and N be? – גלעד ברקן Apr 26 '20 at 13:44
  • I don't know what you mean by "unranking partitions of a sum", and I am not aware of a proof that doing so results in a uniform distribution within the meaning of this question. For this question, both `sum` and `N` are effectively unlimited (within reason). I am seeking a canonical answer because the underlying problem pops up in many questions asked on Stack Overflow, including [this one](https://stackoverflow.com/questions/61387626) and [this one](https://stackoverflow.com/questions/39435481). @גלעדברקן – Peter O. Apr 26 '20 at 17:51
  • If we give each possible combination a "rank" (or index) in an ordered arrangement of all of them, "unranking," would mean generating the combination, given its rank (and N, min, and max, of course). Why wouldn't such a choice of one out of all possible combinations not conform to a uniform distribution? – גלעד ברקן Apr 26 '20 at 17:53
  • If it is possible to list all the valid combinations (that satisfy the first two requirements) in an efficient way, then choosing one of them uniformly at random will satisfy the third requirement. However, that is only viable if the number of valid combinations is small, and I am not aware of a formula for finding out that number. Alternatively, an algorithm to convert an index ("rank") to a valid combination may be efficient if that formula is available. – Peter O. Apr 26 '20 at 17:59
  • What do you mean by the term *"sorted order"*? Are the integers sorted **by index** (i.e. permutations with repetition) or sorted **by value** (i.e. combinations with repetition)? – John McClane Apr 28 '20 at 18:12
  • @JohnMcClane: In sorted order by value. – Peter O. Apr 28 '20 at 18:43
  • @JohnMcClane, I think we are talking about lexicographical order. – Joseph Wood Apr 28 '20 at 21:25
  • https://www.mathpages.com/home/kmath337/kmath337.htm – Dave Apr 28 '20 at 23:10
  • Can you give a sense of how big the parameters are? I.e., How efficient is good enough? – Dave Apr 28 '20 at 23:30
  • There's a DP approach that's O(N * (SUM-MIN * N) * (MAX-MIN)). I.e. O(bins * balls * capacity) where the ints are the bins, the excess after allocating the min to each int are the balls, and the capacity is max-min. I'm guessing this is too slow for you. – Dave Apr 28 '20 at 23:36
  • @Dave: For purposes of this question, an algorithm is efficient if it is at least as efficient as the algorithm I give in the question, but has a running time that does not depend considerably on the difference between `sum` and `max`. – Peter O. Apr 29 '20 at 00:05

6 Answers6

9

Here's my solution in Java. It is fully functional and contains two generators: PermutationPartitionGenerator for unsorted partitions and CombinationPartitionGenerator for sorted partitions. Your generator also implemented in the class SmithTromblePartitionGenerator for comparison. The class SequentialEnumerator enumerates all possible partitions (unsorted or sorted, depending on the parameter) in sequential order. I have added thorough tests (including your test cases) for all of these generators. The implementation is self-explainable for the most part. If you have any questions, I will answer them in couple of days.

import java.util.Random;
import java.util.function.Supplier;

public abstract class PartitionGenerator implements Supplier<int[]>{
    public static final Random rand = new Random();
    protected final int numberCount;
    protected final int min;
    protected final int range;
    protected final int sum; // shifted sum
    protected final boolean sorted;

    protected PartitionGenerator(int numberCount, int min, int max, int sum, boolean sorted) {
        if (numberCount <= 0)
            throw new IllegalArgumentException("Number count should be positive");
        this.numberCount = numberCount;
        this.min = min;
        range = max - min;
        if (range < 0)
            throw new IllegalArgumentException("min > max");
        sum -= numberCount * min;
        if (sum < 0)
            throw new IllegalArgumentException("Sum is too small");
        if (numberCount * range < sum)
            throw new IllegalArgumentException("Sum is too large");
        this.sum = sum;
        this.sorted = sorted;
    }

    // Whether this generator returns sorted arrays (i.e. combinations)
    public final boolean isSorted() {
        return sorted;
    }

    public interface GeneratorFactory {
        PartitionGenerator create(int numberCount, int min, int max, int sum);
    }
}

import java.math.BigInteger;

// Permutations with repetition (i.e. unsorted vectors) with given sum
public class PermutationPartitionGenerator extends PartitionGenerator {
    private final double[][] distributionTable;

    public PermutationPartitionGenerator(int numberCount, int min, int max, int sum) {
        super(numberCount, min, max, sum, false);
        distributionTable = calculateSolutionCountTable();
    }

    private double[][] calculateSolutionCountTable() {
        double[][] table = new double[numberCount + 1][sum + 1];
        BigInteger[] a = new BigInteger[sum + 1];
        BigInteger[] b = new BigInteger[sum + 1];
        for (int i = 1; i <= sum; i++)
            a[i] = BigInteger.ZERO;
        a[0] = BigInteger.ONE;
        table[0][0] = 1.0;
        for (int n = 1; n <= numberCount; n++) {
            double[] t = table[n];
            for (int s = 0; s <= sum; s++) {
                BigInteger z = BigInteger.ZERO;
                for (int i = Math.max(0, s - range); i <= s; i++)
                    z = z.add(a[i]);
                b[s] = z;
                t[s] = z.doubleValue();
            }
            // swap a and b
            BigInteger[] c = b;
            b = a;
            a = c;
        }
        return table;
    }

    @Override
    public int[] get() {
        int[] p = new int[numberCount];
        int s = sum; // current sum
        for (int i = numberCount - 1; i >= 0; i--) {
            double t = rand.nextDouble() * distributionTable[i + 1][s];
            double[] tableRow = distributionTable[i];
            int oldSum = s;
            // lowerBound is introduced only for safety, it shouldn't be crossed 
            int lowerBound = s - range;
            if (lowerBound < 0)
                lowerBound = 0;
            s++;
            do
                t -= tableRow[--s];
            // s can be equal to lowerBound here with t > 0 only due to imprecise subtraction
            while (t > 0 && s > lowerBound);
            p[i] = min + (oldSum - s);
        }
        assert s == 0;
        return p;
    }

    public static final GeneratorFactory factory = (numberCount, min, max,sum) ->
        new PermutationPartitionGenerator(numberCount, min, max, sum);
}

import java.math.BigInteger;

// Combinations with repetition (i.e. sorted vectors) with given sum 
public class CombinationPartitionGenerator extends PartitionGenerator {
    private final double[][][] distributionTable;

    public CombinationPartitionGenerator(int numberCount, int min, int max, int sum) {
        super(numberCount, min, max, sum, true);
        distributionTable = calculateSolutionCountTable();
    }

    private double[][][] calculateSolutionCountTable() {
        double[][][] table = new double[numberCount + 1][range + 1][sum + 1];
        BigInteger[][] a = new BigInteger[range + 1][sum + 1];
        BigInteger[][] b = new BigInteger[range + 1][sum + 1];
        double[][] t = table[0];
        for (int m = 0; m <= range; m++) {
            a[m][0] = BigInteger.ONE;
            t[m][0] = 1.0;
            for (int s = 1; s <= sum; s++) {
                a[m][s] = BigInteger.ZERO;
                t[m][s] = 0.0;
            }
        }
        for (int n = 1; n <= numberCount; n++) {
            t = table[n];
            for (int m = 0; m <= range; m++)
                for (int s = 0; s <= sum; s++) {
                    BigInteger z;
                    if (m == 0)
                        z = a[0][s];
                    else {
                        z = b[m - 1][s];
                        if (m <= s)
                            z = z.add(a[m][s - m]);
                    }
                    b[m][s] = z;
                    t[m][s] = z.doubleValue();
                }
            // swap a and b
            BigInteger[][] c = b;
            b = a;
            a = c;
        }
        return table;
    }

    @Override
    public int[] get() {
        int[] p = new int[numberCount];
        int m = range; // current max
        int s = sum; // current sum
        for (int i = numberCount - 1; i >= 0; i--) {
            double t = rand.nextDouble() * distributionTable[i + 1][m][s];
            double[][] tableCut = distributionTable[i];
            if (s < m)
                m = s;
            s -= m;
            while (true) {
                t -= tableCut[m][s];
                // m can be 0 here with t > 0 only due to imprecise subtraction
                if (t <= 0 || m == 0)
                    break;
                m--;
                s++;
            }
            p[i] = min + m;
        }
        assert s == 0;
        return p;
    }

    public static final GeneratorFactory factory = (numberCount, min, max, sum) ->
        new CombinationPartitionGenerator(numberCount, min, max, sum);
}

import java.util.*;

public class SmithTromblePartitionGenerator extends PartitionGenerator {
    public SmithTromblePartitionGenerator(int numberCount, int min, int max, int sum) {
        super(numberCount, min, max, sum, false);
    }

    @Override
    public int[] get() {
        List<Integer> ls = new ArrayList<>(numberCount + 1);
        int[] ret = new int[numberCount];
        int increasedSum = sum + numberCount;
        while (true) {
            ls.add(0);
            while (ls.size() < numberCount) {
                int c = 1 + rand.nextInt(increasedSum - 1);
                if (!ls.contains(c))
                    ls.add(c);
            }
            Collections.sort(ls);
            ls.add(increasedSum);
            boolean good = true;
            for (int i = 0; i < numberCount; i++) {
                int x = ls.get(i + 1) - ls.get(i) - 1;
                if (x > range) {
                    good = false;
                    break;
                }
                ret[i] = x;
            }
            if (good) {
                for (int i = 0; i < numberCount; i++)
                    ret[i] += min;
                return ret;
            }
            ls.clear();
        }
    }

    public static final GeneratorFactory factory = (numberCount, min, max, sum) ->
        new SmithTromblePartitionGenerator(numberCount, min, max, sum);
}

import java.util.Arrays;

// Enumerates all partitions with given parameters
public class SequentialEnumerator extends PartitionGenerator {
    private final int max;
    private final int[] p;
    private boolean finished;

    public SequentialEnumerator(int numberCount, int min, int max, int sum, boolean sorted) {
        super(numberCount, min, max, sum, sorted);
        this.max = max;
        p = new int[numberCount];
        startOver();
    }

    private void startOver() {
        finished = false;
        int unshiftedSum = sum + numberCount * min;
        fillMinimal(0, Math.max(min, unshiftedSum - (numberCount - 1) * max), unshiftedSum);
    }

    private void fillMinimal(int beginIndex, int minValue, int fillSum) {
        int fillRange = max - minValue;
        if (fillRange == 0)
            Arrays.fill(p, beginIndex, numberCount, max);
        else {
            int fillCount = numberCount - beginIndex;
            fillSum -= fillCount * minValue;
            int maxCount = fillSum / fillRange;
            int maxStartIndex = numberCount - maxCount;
            Arrays.fill(p, maxStartIndex, numberCount, max);
            fillSum -= maxCount * fillRange;
            Arrays.fill(p, beginIndex, maxStartIndex, minValue);
            if (fillSum != 0)
                p[maxStartIndex - 1] = minValue + fillSum;
        }
    }

    @Override
    public int[] get() { // returns null when there is no more partition, then starts over
        if (finished) {
            startOver();
            return null;
        }
        int[] pCopy = p.clone();
        if (numberCount > 1) {
            int i = numberCount;
            int s = p[--i];
            while (i > 0) {
                int x = p[--i];
                if (x == max) {
                    s += x;
                    continue;
                }
                x++;
                s--;
                int minRest = sorted ? x : min;
                if (s < minRest * (numberCount - i - 1)) {
                    s += x;
                    continue;
                }
                p[i++]++;
                fillMinimal(i, minRest, s);
                return pCopy;
            }
        }
        finished = true;
        return pCopy;
    }

    public static final GeneratorFactory permutationFactory = (numberCount, min, max, sum) ->
        new SequentialEnumerator(numberCount, min, max, sum, false);
    public static final GeneratorFactory combinationFactory = (numberCount, min, max, sum) ->
        new SequentialEnumerator(numberCount, min, max, sum, true);
}

import java.util.*;
import java.util.function.BiConsumer;
import PartitionGenerator.GeneratorFactory;

public class Test {
    private final int numberCount;
    private final int min;
    private final int max;
    private final int sum;
    private final int repeatCount;
    private final BiConsumer<PartitionGenerator, Test> procedure;

    public Test(int numberCount, int min, int max, int sum, int repeatCount,
            BiConsumer<PartitionGenerator, Test> procedure) {
        this.numberCount = numberCount;
        this.min = min;
        this.max = max;
        this.sum = sum;
        this.repeatCount = repeatCount;
        this.procedure = procedure;
    }

    @Override
    public String toString() {
        return String.format("=== %d numbers from [%d, %d] with sum %d, %d iterations ===",
                numberCount, min, max, sum, repeatCount);
    }

    private static class GeneratedVector {
        final int[] v;

        GeneratedVector(int[] vect) {
            v = vect;
        }

        @Override
        public int hashCode() {
            return Arrays.hashCode(v);
        }

        @Override
        public boolean equals(Object obj) {
            if (this == obj)
                return true;
            return Arrays.equals(v, ((GeneratedVector)obj).v);
        }

        @Override
        public String toString() {
            return Arrays.toString(v);
        }
    }

    private static final Comparator<Map.Entry<GeneratedVector, Integer>> lexicographical = (e1, e2) -> {
        int[] v1 = e1.getKey().v;
        int[] v2 = e2.getKey().v;
        int len = v1.length;
        int d = len - v2.length;
        if (d != 0)
            return d;
        for (int i = 0; i < len; i++) {
            d = v1[i] - v2[i];
            if (d != 0)
                return d;
        }
        return 0;
    };

    private static final Comparator<Map.Entry<GeneratedVector, Integer>> byCount =
            Comparator.<Map.Entry<GeneratedVector, Integer>>comparingInt(Map.Entry::getValue)
            .thenComparing(lexicographical);

    public static int SHOW_MISSING_LIMIT = 10;

    private static void checkMissingPartitions(Map<GeneratedVector, Integer> map, PartitionGenerator reference) {
        int missingCount = 0;
        while (true) {
            int[] v = reference.get();
            if (v == null)
                break;
            GeneratedVector gv = new GeneratedVector(v);
            if (!map.containsKey(gv)) {
                if (missingCount == 0)
                    System.out.println(" Missing:");
                if (++missingCount > SHOW_MISSING_LIMIT) {
                    System.out.println("  . . .");
                    break;
                }
                System.out.println(gv);
            }
        }
    }

    public static final BiConsumer<PartitionGenerator, Test> distributionTest(boolean sortByCount) {
        return (PartitionGenerator gen, Test test) -> {
            System.out.print("\n" + getName(gen) + "\n\n");
            Map<GeneratedVector, Integer> combos = new HashMap<>();
            // There's no point of checking permus for sorted generators
            // because they are the same as combos for them
            Map<GeneratedVector, Integer> permus = gen.isSorted() ? null : new HashMap<>();
            for (int i = 0; i < test.repeatCount; i++) {
                int[] v = gen.get();
                if (v == null && gen instanceof SequentialEnumerator)
                    break;
                if (permus != null) {
                    permus.merge(new GeneratedVector(v), 1, Integer::sum);
                    v = v.clone();
                    Arrays.sort(v);
                }
                combos.merge(new GeneratedVector(v), 1, Integer::sum);
            }
            Set<Map.Entry<GeneratedVector, Integer>> sortedEntries = new TreeSet<>(
                    sortByCount ? byCount : lexicographical);
            System.out.println("Combos" + (gen.isSorted() ? ":" : " (don't have to be uniform):"));
            sortedEntries.addAll(combos.entrySet());
            for (Map.Entry<GeneratedVector, Integer> e : sortedEntries)
                System.out.println(e);
            checkMissingPartitions(combos, test.getGenerator(SequentialEnumerator.combinationFactory));
            if (permus != null) {
                System.out.println("\nPermus:");
                sortedEntries.clear();
                sortedEntries.addAll(permus.entrySet());
                for (Map.Entry<GeneratedVector, Integer> e : sortedEntries)
                    System.out.println(e);
                checkMissingPartitions(permus, test.getGenerator(SequentialEnumerator.permutationFactory));
            }
        };
    }

    public static final BiConsumer<PartitionGenerator, Test> correctnessTest =
        (PartitionGenerator gen, Test test) -> {
        String genName = getName(gen);
        for (int i = 0; i < test.repeatCount; i++) {
            int[] v = gen.get();
            if (v == null && gen instanceof SequentialEnumerator)
                v = gen.get();
            if (v.length != test.numberCount)
                throw new RuntimeException(genName + ": array of wrong length");
            int s = 0;
            if (gen.isSorted()) {
                if (v[0] < test.min || v[v.length - 1] > test.max)
                    throw new RuntimeException(genName + ": generated number is out of range");
                int prev = test.min;
                for (int x : v) {
                    if (x < prev)
                        throw new RuntimeException(genName + ": unsorted array");
                    s += x;
                    prev = x;
                }
            } else
                for (int x : v) {
                    if (x < test.min || x > test.max)
                        throw new RuntimeException(genName + ": generated number is out of range");
                    s += x;
                }
            if (s != test.sum)
                throw new RuntimeException(genName + ": wrong sum");
        }
        System.out.format("%30s :   correctness test passed%n", genName);
    };

    public static final BiConsumer<PartitionGenerator, Test> performanceTest =
        (PartitionGenerator gen, Test test) -> {
        long time = System.nanoTime();
        for (int i = 0; i < test.repeatCount; i++)
            gen.get();
        time = System.nanoTime() - time;
        System.out.format("%30s : %8.3f s %10.0f ns/test%n", getName(gen), time * 1e-9, time * 1.0 / test.repeatCount);
    };

    public PartitionGenerator getGenerator(GeneratorFactory factory) {
        return factory.create(numberCount, min, max, sum);
    }

    public static String getName(PartitionGenerator gen) {
        String name = gen.getClass().getSimpleName();
        if (gen instanceof SequentialEnumerator)
            return (gen.isSorted() ? "Sorted " : "Unsorted ") + name;
        else
            return name;
    }

    public static GeneratorFactory[] factories = { SmithTromblePartitionGenerator.factory,
            PermutationPartitionGenerator.factory, CombinationPartitionGenerator.factory,
            SequentialEnumerator.permutationFactory, SequentialEnumerator.combinationFactory };

    public static void main(String[] args) {
        Test[] tests = {
                            new Test(3, 0, 3, 5, 3_000, distributionTest(false)),
                            new Test(3, 0, 6, 12, 3_000, distributionTest(true)),
                            new Test(50, -10, 20, 70, 2_000, correctnessTest),
                            new Test(7, 3, 10, 42, 1_000_000, performanceTest),
                            new Test(20, 3, 10, 120, 100_000, performanceTest)
                       };
        for (Test t : tests) {
            System.out.println(t);
            for (GeneratorFactory factory : factories) {
                PartitionGenerator candidate = t.getGenerator(factory);
                t.procedure.accept(candidate, t);
            }
            System.out.println();
        }
    }
}

You can try this on Ideone.

John McClane
  • 3,498
  • 3
  • 12
  • 33
  • Thanks for your answer; it works well. I have described the permutation generator in another answer here; [answered another question](https://stackoverflow.com/a/61520472/815724) with your help; and will soon include your algorithm in Python sample code for my article on random generation methods. – Peter O. Apr 30 '20 at 15:44
  • Just to be clear. Does this algorithm rely on generating _**all**_ possible partitions/compositions in order to sample? – Joseph Wood May 01 '20 at 16:08
  • @JosephWood No, it relies on **counting** all of them. This is done only one time at the generator initialization and is rather effective because it utilizes the dynamic programming approach. – John McClane May 01 '20 at 22:35
  • How can dynamic programming solve the related problem of choosing a uniform random partition of 'sum' into N integers chosen at random _with replacement_ from a list ([example](https://stackoverflow.com/questions/60679090)) or _without replacement_ ([example](https://stackoverflow.com/questions/59805786/random-numbers-picked-from-array-that-sums-up-to-a-float)), or how can that problem otherwise be solved? – Peter O. May 02 '20 at 04:37
  • @PeterO. You need to count all possible partitions via the same method as in my algorithm, but this time you need to subtract only **allowable** numbers from the sum. This is too long to comment, you can ask a separate question. I suspect that one can solve four different problems via the same approach. Suppose you have a list of distinct integers to choose from (this is just a continuous range in this question). Then you can generate random arrays of a given length consisting of numbers from this list with the given sum if the arrays should be sorted/unsorted and allow/disallow a repetition. – John McClane May 02 '20 at 13:51
  • @JohnMcClane would it be possible to give an explanation at a high level what you're trying to do here? I've read Peter O.'s description in a separate answer but it is quite a low level explanation of what the algorithm does. What do the elements of `decisionTable` in `PermutationPartitionGenerator` represent and how is sample conducted from this table? – Will Dec 18 '20 at 15:07
  • 1
    @Will Did you mean `distributionTable`? It is a table pre-calculated at the constructor and then used in the `get()` method to generate random partitions. `d.t.[n][s]` counts how many sequences of n numbers from 0 to range = max - min, inclusive, have the sum s. To generate the i-th term after we've already found the terms with higher indices, we multiply `d.t.[i + 1][s]` (which is the sum of `d.t.[i][s]` for s in some interval) by a random number unif. distributed in [0,1) and then look for the highest s (new sum of terms) such that the product t is less than accumulated sum of `d.t.[i][s]`. – John McClane Dec 18 '20 at 16:56
6

Here is the algorithm from John McClane's PermutationPartitionGenerator, in another answer on this page. It has two phases, namely a setup phase and a sampling phase, and generates n random variates in [min, max] with the sum sum, where the numbers are listed in random order.

Setup phase: First, a solution table is built using the following formulas (t(y, x) where y is in [0, n] and x is in [0, sum - n * min]):

  • t(0, j) = 1 if j == 0; 0 otherwise
  • t(i, j) = t(i-1, j) + t(i-1, j-1) + ... + t(i-1, j-(max-min))

Here, t(y, x) stores the relative probability that the sum of y numbers (in the appropriate range) will equal x. This probability is relative to all t(y, x) with the same y.

Sampling phase: Here we generate a sample of n numbers. Set s to sum - n * min, then for each position i, starting with n - 1 and working backwards to 0:

  • Set v to a uniform random integer in [0, t(i+1, s)).
  • Set r to min.
  • Subtract t(i, s) from v.
  • While v remains 0 or greater, subtract t(i, s-1) from v, add 1 to r, and subtract 1 from s.
  • The number at position i in the sample is set to r.

EDIT:

It appears that with trivial changes to the algorithm above, it's possible to have each random variate use a separate range rather than use the same range for all of them:

Each random variate at positions i ∈ [0, n) has a minimum value min(i) and a maximum value max(i).

Let adjsum = sum - ∑min(i).

Setup phase: First, a solution table is built using the following formulas (t(y, x) where y is in [0, n] and x is in [0, adjsum]):

  • t(0, j) = 1 if j == 0; 0 otherwise
  • t(i, j) = t(i-1, j) + t(i-1, j-1) + ... + t(i-1, j-(max(i-1)-min(i-1)))

The sampling phase is then exactly the same as before, except we set s to adjsum (rather than sum - n * min) and set r to min(i) (rather than min).


EDIT:

For John McClane's CombinationPartitionGenerator, the setup and sampling phases are as follows.

Setup phase: First, a solution table is built using the following formulas (t(z, y, x) where z is in [0, n], y is in [0, max - min], and x is in [0, sum - n * min]):

  • t(0, j, k) = 1 if k == 0; 0 otherwise
  • t(i, 0, k) = t(i - 1, 0, k)
  • t(i, j, k) = t(i, j-1, k) + t(i - 1, j, k - j)

Sampling phase: Here we generate a sample of n numbers. Set s to sum - n * min and mrange to max - min, then for each position i, starting with n - 1 and working backwards to 0:

  • Set v to a uniform random integer in [0, t(i+1, mrange, s)).
  • Set mrange to min(mrange, s)
  • Subtract mrange from s.
  • Set r to min + mrange.
  • Subtract t(i, mrange, s) from v.
  • While v remains 0 or greater, add 1 to s, subtract 1 from r and 1 from mrange, then subtract t(i, mrange, s) from v.
  • The number at position i in the sample is set to r.
Peter O.
  • 32,158
  • 14
  • 82
  • 96
3

I have not tested this, so it is not really an answer, just something to try which is too long to fit into a comment. Start with an array which meets the first two criteria and play with it so it still meets the first two, but is a lot more random.

If the mean is an integer, then your initial array can be [4, 4, 4, ... 4] or maybe [3, 4, 5, 3, 4, 5, ... 5, 8, 0] or something simple like that. For a mean of 4.5, try [4, 5, 4, 5, ... 4, 5].

Next pick a pair of numbers, num1 and num2, in the array. Probably the first number should be taken in order, as with the Fisher-Yates shuffle, the second number should be picked at random. Taking the first number in order ensures that every number is picked at least once.

Now calculate max-num1 and num2-min. Those are the distances from the two numbers to the max and min boundaries. Set limit to the smaller of the two distances. That is the maximum change allowed which will not put one or other of the numbers outside the allowed limits. If limit is zero then skip this pair.

Pick a random integer in the range [1, limit]: call it change. I omit 0 from the pickable range as it has no effect. Testing may show that you get better randomness by including it; I'm not sure.

Now set num1 <- num1 + change and num2 <- num2 - change. That will not affect the mean value and all elements of the array are still within the required boundaries.

You will need to run through the whole array at least once. Testing should show if you need to run through it more than once to get something sufficiently random.

ETA: include pseudocode

// Set up the array.
resultAry <- new array size N
for (i <- 0 to N-1)
  // More complex initial setup schemes are possible here.
  resultAry[i] <- mean
rof

// Munge the array entries.
for (ix1 <- 0 to N-1)  // ix1 steps through the array in order.

  // Pick second entry different from first.
  repeat
    ix2 <- random(0, N-1)
  until (ix2 != ix1)

  // Calculate size of allowed change.
  hiLimit <- max - resultAry[ix1]
  loLimit <- resultAry[ix2] - min
  limit <- minimum(hiLimit, loLimit)
  if (limit == 0)
    // No change possible so skip.
    continue loop with next ix1
  fi

  // Change the two entries keeping same mean.
  change <- random(1, limit)  // Or (0, limit) possibly.
  resultAry[ix1] <- resultAry[ix1] + change
  resultAry[ix2] <- resultAry[ix2] - change

rof

// Check array has been sufficiently munged.
if (resultAry not random enough)
  munge the array again
fi
rossum
  • 15,344
  • 1
  • 24
  • 38
3

As the OP points out, the ability to efficiently unrank is very powerful. If we are able to do so, generating a uniform distribution of partitions can be done in three steps (restating what the OP has laid out in the question):

  1. Calculate the total number, M, of partitions of length N of the number sum such that the parts are in the range [min, max].
  2. Generate a uniform distribution of integers from [1, M].
  3. Unrank each integer from step 2 into its respective partition.

Below, we only focus on generating the nth partition as there is a copious amount of information on generating a uniform distribution of integer in a given range. Here is a simple C++ unranking algorithm which should be easy to translate to other languages (N.B. I have not figured out how to unrank the composition case yet (i.e. order matters)).

std::vector<int> unRank(int n, int m, int myMax, int nth) {

    std::vector<int> z(m, 0);
    int count = 0;
    int j = 0;

    for (int i = 0; i < z.size(); ++i) {
        int temp = pCount(n - 1, m - 1, myMax);

        for (int r = n - m, k = myMax - 1;
             (count + temp) < nth && r > 0 && k; r -= m, --k) {

            count += temp;
            n = r;
            myMax = k;
            ++j;
            temp = pCount(n - 1, m - 1, myMax);
        }

        --m;
        --n;
        z[i] = j;
    }

    return z;
}

The workhorse pCount function is given by:

int pCount(int n, int m, int myMax) {

    if (myMax * m < n) return 0;
    if (myMax * m == n) return 1;

    if (m < 2) return m;
    if (n < m) return 0;
    if (n <= m + 1) return 1;

    int niter = n / m;
    int count = 0;

    for (; niter--; n -= m, --myMax) {
        count += pCount(n - 1, m - 1, myMax);
    }

    return count;
}

This function is based off of the excellent answer to Is there an efficient algorithm for integer partitioning with restricted number of parts? by user @m69_snarky_and_unwelcoming. The one given above is a slight modification of the simple algorithm (the one without memoization). This can easily be modified to incorporate memoization for greater efficiency. We will leave this off for now and focus on the unranking portion.

Explanation of unRank

We first note there is a one-to-one mapping from the partitions of length N of the number sum such that the parts are in the range [min, max] to the restricted partitions of length N of the number sum - N * (min - 1) with parts in [1, max - (min - 1)].

As a small example, consider the partitions of 50 of length 4 such that the min = 10 and the max = 15. This will have the same structure as the restricted partitions of 50 - 4 * (10 - 1) = 14 of length 4 with the maximum part equal to 15 - (10 - 1) = 6.

10   10   15   15   --->>    1    1    6    6
10   11   14   15   --->>    1    2    5    6
10   12   13   15   --->>    1    3    4    6
10   12   14   14   --->>    1    3    5    5
10   13   13   14   --->>    1    4    4    5
11   11   13   15   --->>    2    2    4    6
11   11   14   14   --->>    2    2    5    5
11   12   12   15   --->>    2    3    3    6
11   12   13   14   --->>    2    3    4    5
11   13   13   13   --->>    2    4    4    4
12   12   12   14   --->>    3    3    3    5
12   12   13   13   --->>    3    3    4    4

With this in mind, in order to easily count, we could add a step 1a to translate the problem to the "unit" case if you will.

Now, we simply have a counting problem. As @m69 brilliantly displays, counting partitions can be easily achieved by breaking the problem into smaller problems. The function @m69 provides gets us 90% of the way, we just have to figure out what to do with the added restriction that there is a cap. This is where we get:

int pCount(int n, int m, int myMax) {

    if (myMax * m < n) return 0;
    if (myMax * m == n) return 1;

We also have to keep in mind that myMax will decrease as we move along. This makes sense if we look at the 6th partition above:

2   2   4   6

In order to count the number of partitions from here on out, we must keep applying the translation to the "unit" case. This looks like:

1   1   3   5

Where as the step before, we had a max of 6, now we only consider a max of 5.

With this in mind, unranking the partition is no different than unranking a standard permutation or combination. We must be able to count the number of partitions in a given section. For example, to count the number of partitions that start with 10 above, all we do is remove the 10 in the first column:

10   10   15   15
10   11   14   15
10   12   13   15
10   12   14   14
10   13   13   14

10   15   15
11   14   15
12   13   15
12   14   14
13   13   14

Translate to the unit case:

1   6   6
2   5   6
3   4   6
3   5   5
4   4   5

and call pCount:

pCount(13, 3, 6) = 5

Given a random integer to unrank, we continue calculating the number of partitions in smaller and smaller sections (as we did above) until we have filled our index vector.

Examples

Given min = 3, max = 10, n = 7, and sum = 42, here is an ideone demo that generates 20 random partitions. The output is below:

42: 3 3 6 7 7 8 8 
123: 4 4 6 6 6 7 9 
2: 3 3 3 4 9 10 10 
125: 4 4 6 6 7 7 8 
104: 4 4 4 6 6 8 10 
74: 3 4 6 7 7 7 8 
47: 3 4 4 5 6 10 10 
146: 5 5 5 5 6 7 9 
70: 3 4 6 6 6 7 10 
134: 4 5 5 6 6 7 9 
136: 4 5 5 6 7 7 8 
81: 3 5 5 5 8 8 8 
122: 4 4 6 6 6 6 10 
112: 4 4 5 5 6 8 10 
147: 5 5 5 5 6 8 8 
142: 4 6 6 6 6 7 7 
37: 3 3 6 6 6 9 9 
67: 3 4 5 6 8 8 8 
45: 3 4 4 4 8 9 10 
44: 3 4 4 4 7 10 10

The lexicographical index is on the left and the unranked partition in on the right.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
Joseph Wood
  • 7,077
  • 2
  • 30
  • 65
0

If you generate 0≤a≤1 of the random values in the range [l, x-1] uniformly, and 1-a of the random values in the range [x, h] uniformly, the expected mean would be:

m = ((l+x-1)/2)*a + ((x+h)/2)*(1-a)

So if you want a specific m, you can play with a and x.

For example, if you set x = m: a = (h-m)/(h-l+1).

To ensure closer-to-uniform probability for different combinations, choose a or x randomly from the set of valid solutions to the equation above. (x must be in the range [l, h] and should be (close to) an integer; N*a should be (close to) an integer as well.

Lior Kogan
  • 19,919
  • 6
  • 53
  • 85
0

I implemented the (unsorted) algorithm for Python-numpy with the separate range [min, max] for each random number. Maybe it can be useful for people using Python as primary programming language.

import numpy as np


def randint_sum_equal_to(sum_value: int, 
                         n: int, 
                         lower: (int, list) = 0, 
                         upper: (int,list) = None):

# Control on input
if isinstance(lower, (list, np.ndarray)):
    assert len(lower) == n
else:
    lower = lower * np.ones(n)
if isinstance(upper, (list, np.ndarray)):
    assert len(upper) == n
elif upper is None:
    upper = sum_value * np.ones(n)
else:
    upper = upper * np.ones(n)

# Trivial solutions
if np.sum(upper) < sum_value:
    raise ValueError('No solution can be found: sum(upper_bound) < sum_value')
elif np.sum(lower) > sum_value:
    raise ValueError('No solution can be found: sum(lower_bound) > sum_value')
elif np.sum(upper) == sum_value:
    return upper
elif np.sum(lower) == sum_value:
    return lower

# Setup phase
# I generate the table t(y,x) storing the relative probability that the sum of y numbers
# (in the appropriate range) is equal x.
t = np.zeros((n + 1, sum_value))
t[0, 0] = 1
for i in np.arange(1, n + 1):
    # Build the k indexes which are taken for each j following k from 0 to min(u(i-1)-l(i-1), j).
    # This can be obtained creating a repetition matrix of from t[i] multiplied by the triangular matrix
    # tri_mask and then sum each row
    tri_mask = np.tri(sum_value, k=0) - np.tri(sum_value, k=-(upper[i-1] - lower[i-1]))
    t[i] = np.sum(np.repeat(t[i-1][np.newaxis], sum_value, 0)*tri_mask, axis=1)

# Sampling phase
values = np.zeros(n)
s = (sum_value - np.sum(lower)).astype(int)
for i in np.arange(n)[::-1]:
    # The basic algorithm is the one commented:
    # v = np.round(np.random.rand() * t[i+1, s])
    # r = lower[i]
    # v -= t[i, s]
    # while (v >= 0) and (s > 0):
    #     s -= 1
    #     v -= t[i, s]
    #     r += 1
    # values[i] = r
    # ---------------------------------------------------- #
    # To speed up the convergence I use some numpy tricks.
    # The idea is the same of the Setup phase:
    # - I build a repeat matrix of t[i, s:1];
    # - I take only the lower triangular part, multiplying by a np.tri(s)
    # - I sum over rows, so each element of sum_t contains the cumulative sum of t[i, s - k]
    # - I subtract v - sum_t and count the element greater of equal zero,
    #   which are used to set the output and update s
    v = np.round(np.random.rand() * t[i+1, s])
    values[i] = lower[i]
    sum_t = np.sum(np.repeat(t[i, np.arange(1, s + 1)[::-1]][np.newaxis], s, 0) * np.tri(s), axis=1)
    vt_difference_nonzero = np.sum(np.repeat(v, s) - sum_t >= 0)
    values[i] += vt_difference_nonzero
    s -= vt_difference_nonzero
return values.astype(int)
Peter O.
  • 32,158
  • 14
  • 82
  • 96