3

Based on this reply: Random numbers that add to 100: Matlab I tried to apply the suggested method to randomly divide an integer into a fixed number of integers whose sum is equal to the integer. Although that method seems to result in a uniformly distributed set of points when the values are not integers, in the case of integers, the resulting tuples are not obtained with equal probability. This is shown by the following implementation in R, where a simple case is tested with 3 divisors and with the integer to be divided equal to 5:

# Randomly divide an integer into a defined number of integers 
# Goal: obtain with equal probability any combination of variable values, with the condition that sum(variables) = dividend.

# install.packages(rgl)  # Install rgl package if not yet installed. This allows to use the plot3d function to create a 3D scatterplot.
library(rgl)

n_draws = 10000
n_variables = 3  # Number of divisors. These need to be randomly calculated. Their value must be in the interval [0:dividend] and their sum must be equal to the dividend. Two variables can have the same value.
dividend = 5  # Number that needs to be divided.
rand_variables = matrix(nrow = n_draws, ncol = n_variables)  # This matrix contains the final values for each variable (one column per variable).
rand_samples = matrix(nrow = n_draws, ncol = n_variables-1)  # This matrix contains the intermediate values that are used to randomly divide the dividend.

for (k in 1:n_draws){
    rand_samples[k,] = sample(x = c(0:dividend), size = n_variables-1, replace = TRUE)  # Randomly select (n_variables - 1) values within the range 0:dividend. The values in rand_samples are uniformly distributed.
    midpoints = sort(rand_samples[k,])
    rand_variables[k,] = sample(diff(c(0, midpoints, dividend)), n_variables)  # Calculate the values of each variable such that their sum is equal to the dividend.
}

plot3d(rand_variables)  # Create a 3D scatterplot showing the values of rand_variables. This plot does not show how frequently each combination of values of the n_variables is obtained, only which combinations of values are possible.

table(data.frame(rand_variables))  # This prints out the count of each combination of values of n_variables. It shows that the combinations of values in the corners (e.g. (5,0,0)) are obtained less frequently than other combinations (e.g. (1,2,2)).

The last line gives the following output, which shows how many times were obtained each combination of values of (X1, X2, X3) that respect the condition X1 + X2 + X3 = 5:

, , X3 = 0

   X2
X1    0   1   2   3   4   5
  0   0   0   0   0   0 397
  1   0   0   0   0 471   0
  2   0   0   0 469   0   0
  3   0   0 446   0   0   0
  4   0 456   0   0   0   0
  5 358   0   0   0   0   0

, , X3 = 1

   X2
X1    0   1   2   3   4   5
  0   0   0   0   0 450   0
  1   0   0   0 539   0   0
  2   0   0 560   0   0   0
  3   0 588   0   0   0   0
  4 426   0   0   0   0   0
  5   0   0   0   0   0   0

, , X3 = 2

   X2
X1    0   1   2   3   4   5
  0   0   0   0 428   0   0
  1   0   0 603   0   0   0
  2   0 549   0   0   0   0
  3 461   0   0   0   0   0
  4   0   0   0   0   0   0
  5   0   0   0   0   0   0

, , X3 = 3

   X2
X1    0   1   2   3   4   5
  0   0   0 500   0   0   0
  1   0 549   0   0   0   0
  2 455   0   0   0   0   0
  3   0   0   0   0   0   0
  4   0   0   0   0   0   0
  5   0   0   0   0   0   0

, , X3 = 4

   X2
X1    0   1   2   3   4   5
  0   0 465   0   0   0   0
  1 458   0   0   0   0   0
  2   0   0   0   0   0   0
  3   0   0   0   0   0   0
  4   0   0   0   0   0   0
  5   0   0   0   0   0   0

, , X3 = 5

   X2
X1    0   1   2   3   4   5
  0 372   0   0   0   0   0
  1   0   0   0   0   0   0
  2   0   0   0   0   0   0
  3   0   0   0   0   0   0
  4   0   0   0   0   0   0
  5   0   0   0   0   0   0

As the output shows, the combinations of values in the corners of the plane (e.g. (5,0,0)) are obtained less frequently than other tuples.

How can I obtain any integer tuple with the same probability?

I'm looking for a solution that is applicable for any positive integer and for any number of divisors.

Community
  • 1
  • 1
Dave
  • 33
  • 4
  • Does the order make a difference in your tuples? i.e. do you want the probability of (5,0,0) to be the same as the probability of (0,2,3), or do you want the probability of: (5,0,0) OR (0,5,0) OR (0,0,5) to be the same as the probability of: (0,2,3) OR (0,3,2) OR (2,0,3) OR (2,3,0) OR (3,0,2) OR (3,2,0) – Jacob Socolar Nov 23 '15 at 19:22
  • Yes the order makes a difference. Thanks for the question. In the example shown above, there are a total of 21 distinct possible tuples that meet the conditions. Each should have the same probability of being generated (i.e. a probability of 1/21). – Dave Nov 23 '15 at 19:39
  • You might be interested in [this question](http://stackoverflow.com/q/3589214/270986), which is essentially the same problem (but for Python). – Mark Dickinson Nov 23 '15 at 20:51
  • Thanks for the link Mark. However, TheTime's solution is what I was looking for. – Dave Nov 23 '15 at 21:22

1 Answers1

2

I think trying to make these combinations/permutations manually is reinventing the wheel. There are efficient algorithms to do this implemented in partitions. For example,

library(partitions)                            # compositions, parts, restrictedparts may be of interest
sample_size <- 1000
pool <- compositions(5, 3)                     # pool of possible tuples
samp <- sample(ncol(pool), sample_size, TRUE)  # sample uniformly

## These are you sampled tuples, each column
z <- matrix(pool[,samp], 3)

Side note: don't use a data.frame, use a matrix to store a set of integers. data.frames will be entirely copied every time you modify something ([.data.frame is not a primitive), whereas matrices will modify in place.

Rorschach
  • 31,301
  • 5
  • 78
  • 129