0

I'm working on a project for fun and I need an algorithm to do as follows: Generate a list of numbers of Length n which add up to x

I would settle for list of integers, but ideally, I would like to be left with a set of floating point numbers.

I would be very surprised if this problem wasn't heavily studied, but I'm not sure what to look for.

I've tackled similar problems in the past, but this one is decidedly different in nature. Before I've generated different combinations of a list of numbers that will add up to x. I'm sure that I could simply bruteforce this problem but that hardly seems like the ideal solution.

Anyone have any idea what this may be called, or how to approach it? Thanks all!

Edit: To clarify, I mean that the list should be length N while the numbers themselves can be of any size.

edit2: Sorry for my improper use of 'set', I was using it as a catch all term for a list or an array. I understand that it was causing confusion, my apologies.

danem
  • 1,495
  • 5
  • 19
  • 35
  • 1
    To be clear, by "length `n`" you mean that the decimal representations of the integers, without leading zeroes, should be `n` digits long, right? – jwodder Aug 16 '11 at 19:26
  • 4
    Do you require integers? If not, just generate `n` random numbers, calculate their sum and scale them down or up to the required sum. – hammar Aug 16 '11 at 19:27
  • @jwodder I'm sorry for the ambiguity, I edited my question to clarify – danem Aug 16 '11 at 19:30
  • Do you want to sample uniformly from the set of solutions? – Rob Neuhaus Aug 16 '11 at 19:30
  • @hammar While I suppose I would settle for integer, ideally the numbers would be floating point. – danem Aug 16 '11 at 19:30
  • Members can't be duplicates - it's a true set, correct? – Ed Staub Aug 16 '11 at 19:31
  • @Pete As a mathematician, I must tell you that arbitrary sets do not have length — they have *size* or *cardinality*. – jwodder Aug 16 '11 at 19:31
  • What's the "length" of a set? – Kerrek SB Aug 16 '11 at 19:32
  • @rrenaud Haha, I apologize, but I'm not sure what you are asking. I'm not very knowledgable in math. – danem Aug 16 '11 at 19:32
  • If you want a uniform distribution, you could calculate the number of possibilities, generate an integer from 1 to that number and then figure out how to map from the number to the unique set of integers. Are duplicates allowed? EDIT- Nevermind, I see that you don't want integers. – 101100 Aug 16 '11 at 19:32
  • I assumed that "length n" means of "size n" - the size of the set - not the size of the members. – Ed Staub Aug 16 '11 at 19:33
  • @Ed Staub, Sorry, but it appears my use of the word 'set' was very misleading, no, this does not have to be a true set. It very well could, but that is not a requirement. – danem Aug 16 '11 at 19:37
  • Should your numbers be positive? Otherwise, you can't do it in a defined manner, you need to specify some sort of probability distribution... – Per Alexandersson Aug 16 '11 at 21:10
  • 2
    possible duplicate of [Getting N random numbers that the sum is M](http://stackoverflow.com/questions/2640053/getting-n-random-numbers-that-the-sum-is-m) – Jason S Aug 16 '11 at 23:05
  • 2
    also http://stackoverflow.com/questions/3959021/non-biased-return-a-list-of-n-random-positive-numbers-0-so-that-their-sum/3960993#3960993 which discusses some of the subtleties – Jason S Aug 16 '11 at 23:06

6 Answers6

6

This is how to do it in Python

import random

def random_values_with_prescribed_sum(n, total):
    x = [random.random() for i in range(n)]
    k = total / sum(x)
    return [v * k for v in x]

Basically you pick n random numbers, compute their sum and compute a scale factor so that the sum will be what you want it to be.

Note that this approach will not produce "uniform" slices, i.e. the distribution you will get will tend to be more "egalitarian" than it should be if it was picked at random among all distribution with the given sum.

To see the reason you can just picture what the algorithm does in the case of two numbers with a prescribed sum (e.g. 1):

enter image description here

The point P is a generic point obtained by picking two random numbers and it will be uniform inside the square [0,1]x[0,1]. The point Q is the point obtained by scaling P so that the sum is required to be 1. As it's clear from the picture the points close to the center of the have an higher probability; for example the exact center of the squares will be found by projecting any point on the diagonal (0,0)-(1,1), while the point (0, 1) will be found projecting only points from (0,0)-(0,1)... the diagonal length is sqrt(2)=1.4142... while the square side is only 1.0.

6502
  • 112,025
  • 15
  • 165
  • 265
  • 1
    This will work except for if the desired total is zero (will give a not-so-random output), and if the sum of the random numbers happens to be very close to zero (will blow up or produce enormous numbers). Still the best solution though, but may need a tweak to be general. – Anders Forsgren Aug 16 '11 at 19:44
  • Thanks a lot, surprisingly simple. I've just implemented it in javascript with no difficulty. – danem Aug 16 '11 at 19:54
  • Be warned that this distribution is not uniform over all n-tuples that sum to x (i.e.: not all n-tuples are equally likely). Odds are, the asker might not care that much so long as the outputs look sort of random. But, if they do care, please see [this blog post](http://geomblog.blogspot.com/2005/10/sampling-from-simplex.html) or page 568 of [Devroye's book](http://cg.scs.carleton.ca/~luc/rnbookindex.html). – mhum Aug 16 '11 at 21:42
  • @mhum: Good point. I don't think this is often a bad problem because slices tend actually to be "more equal" than with a uniform approach. – 6502 Aug 17 '11 at 09:47
4

Actually, you need to generate a partition of x into n parts. This is usually done the in following way: The partition of x into n non-negative parts can be represented in the following way: reserve n + x free places, put n borders to some arbitrary places, and stones to the rest. The stone groups add up to x, thus the number of possible partitions is the binomial coefficient (n + x \atop n).

So your algorithm could be as follows: choose an arbitrary n-subset of (n + x)-set, it determines uniquely a partition of x into n parts.

In Knuth's TAOCP the chapter 3.4.2 discusses random sampling. See Algortihm S there.


Algorithm S: (choose n arbitrary records from total of N)

  1. t = 0, m = 0;
  2. u = random, uniformly distributed on (0, 1)
  3. if (N - t)*u >= n - m, skip t-th record and increase t by 1; otherwise include t-th record in the sample, increase m and t by 1
  4. if M < n, return to 2, otherwise, algorithm finished

The solution for non-integers is algorithmically trivial: you just select arbitrary n numbers that don't sum up to 0, and norm them by their sum.

Vlad
  • 35,022
  • 6
  • 77
  • 199
  • I'm not clear as to what you mean by placing stones and borders. But thank you for the very in depth answer, I appreciate you pointing me towards the mathematics rather than just a code snippet. – danem Aug 16 '11 at 19:53
  • @Pete: look: if you put n borders and x stones in a row, they form a partition of x into n addends this way: if there are k stones between the first and second border, they make first addend k; if there are m stones between 2nd and 3rd border, they make the second addend m, etc. – Vlad Aug 16 '11 at 19:57
  • @Pete: you will get exactly n addends, summing up to x – Vlad Aug 16 '11 at 19:58
4

If you want to sample uniformly in the region of N-1-dimensional space defined by x1 + x2 + ... + xN = x, then you're looking at a special case of sampling from a Dirichlet distribution. The sampling procedure is a little more involved than generating uniform deviates for the xi. Here's one way to do it, in Python:

xs = [random.gammavariate(1,1) for a in range(N)]
xs = [x*v/sum(xs) for v in xs]

If you don't care too much about the sampling properties of your results, you can just generate uniform deviates and correct their sum afterwards.

Ian Ross
  • 987
  • 8
  • 14
1

Here is a version of the above algorithm in Javascript

    function getRandomArbitrary(min, max) {
        return Math.random() * (max - min) + min;
    };
    function getRandomArray(min, max, n) {
        var arr = [];
        for (var i = 0, l = n; i < l; i++) {
            arr.push(getRandomArbitrary(min, max))
        };
        return arr;
    };
    function randomValuesPrescribedSum(min, max, n, total) {
        var arr = getRandomArray(min, max, n);
        var sum = arr.reduce(function(pv, cv) { return pv + cv; }, 0);
        var k = total/sum;
        var delays = arr.map(function(x) { return k*x; })
        return delays;
    };

You can call it with

var myarray = randomValuesPrescribedSum(0,1,3,3);

And then check it with

var sum = myarray.reduce(function(pv, cv) { return pv + cv;},0);
Riley
  • 1,198
  • 1
  • 11
  • 9
0

In python:

a: create a list of (random #'s 0 to 1) times total; append 0 and total to the list

b: sort the list, measure the distance between each element

c: round the list elements

import random
import time

TOTAL       = 15
PARTS       = 4
PLACES      = 3

def random_sum_split(parts, total, places):

    a = [0, total] + [random.random()*total for i in range(parts-1)]
    a.sort()
    b = [(a[i] - a[i-1]) for i in range(1, (parts+1))]
    if places == None:
        return b
    else:    
        b.pop()
        c = [round(x, places) for x in b]  
        c.append(round(total-sum(c), places))
        return c

def tick():

    if info.tick == 1:

        start = time.time()
        alpha = random_sum_split(PARTS, TOTAL, PLACES)
        end = time.time()  

        log('alpha: %s' % alpha)
        log('total: %.7f' % sum(alpha))
        log('parts: %s' % PARTS)
        log('places: %s' % PLACES)
        log('elapsed: %.7f' % (end-start))

yields:

[2014-06-13 01:00:00] alpha: [0.154, 3.617, 6.075, 5.154]
[2014-06-13 01:00:00] total: 15.0000000
[2014-06-13 01:00:00] parts: 4
[2014-06-13 01:00:00] places: 3
[2014-06-13 01:00:00] elapsed: 0.0005839

to the best of my knowledge this distribution is uniform

litepresence
  • 3,109
  • 1
  • 27
  • 35
0

This code does a reasonable job. I think it produces a different distribution than 6502's answer, but I am not sure which is better or more natural. Certainly his code is clearer/nicer.

import random

def parts(total_sum, num_parts):
  points = [random.random() for i in range(num_parts-1)]
  points.append(0)
  points.append(1)
  points.sort()

  ret = []
  for i in range(1, len(points)):
    ret.append((points[i] - points[i-1]) * total_sum)
  return ret

def test(total_sum, num_parts):
  ans = parts(total_sum, num_parts)
  assert abs(sum(ans) - total_sum) < 1e-7
  print ans

test(5.5, 3)
test(10, 1)
test(10, 5)
Rob Neuhaus
  • 9,190
  • 3
  • 28
  • 37