3

I would like to randomly assign positive integers to G groups, such that they sum up to V.

For example, if G = 3 and V = 21, valid results may be (7, 7, 7), (10, 6, 5), etc.

Is there a straightforward way to do this?


Editor's notice (from 李哲源):

If values are not restricted to integers, the problem is simple and has been addressed in Choosing n numbers with fixed sum.

For integers, there is a previous Q & A: Generate N random integers that sum to M in R but it appears more complicated and is hard to follow. The loop based solution over there is also not satisfying.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
compbiostats
  • 909
  • 7
  • 22

3 Answers3

5

non-negative integers

Let n be sample size:

x <- rmultinom(n, V, rep.int(1 / G, G))

is a G x n matrix, where each column is a multinomial sample that sums up to V.

By passing rep.int(1 / G, G) to argument prob I assume that each group has equal probability of "success".


positive integers

As Gregor mentions, a multinomial sample can contain 0. If such samples are undesired, they should be rejected. As a result, we sample from a truncated multinomial distribution.

In How to generate target number of samples from a distribution under a rejection criterion I suggested an "over-sampling" approach to achieve "vectorization" for a truncated sampling. Simply put, Knowing the acceptance probability we can estimate the expected number of trials M to see the first "success" (non-zero). We first sample say 1.25 * M samples, then there will be at least one "success" in these samples. We randomly return one as the output.

The following function implements this idea to generate truncated multinomial samples without 0.

positive_rmultinom <- function (n, V, prob) {
  ## input validation
  G <- length(prob)
  if (G > V) stop("'G > V' causes 0 in a sample for sure!")
  if (any(prob < 0)) stop("'prob' can not contain negative values!")
  ## normalization
  sum_prob <- sum(prob)
  if (sum_prob != 1) prob <- prob / sum_prob
  ## minimal probability
  min_prob <- min(prob)
  ## expected number of trials to get a "success" on the group with min_prob
  M <- round(1.25 * 1 / min_prob)
  ## sampling
  N <- n * M
  x <- rmultinom(N, V, prob)
  keep <- which(colSums(x == 0) == 0)
  x[, sample(keep, n)]
  }

Now let's try

V <- 76
prob <- c(53, 13, 9, 1)

Directly using rmultinom to draw samples can occasionally result in ones with 0:

## number of samples that contain 0 in 1000 trials
sum(colSums(rmultinom(1000, V, prob) == 0) > 0)
#[1] 355   ## or some other value greater than 0

But there is no such issue by using positive_rmultinom:

## number of samples that contain 0 in 1000 trials
sum(colSums(positive_rmultinom(1000, V, prob) == 0) > 0)
#[1] 0
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
2

Probably a less expensive way, but this seems to work.

G <- 3
V <- 21
m <- data.frame(matrix(rep(1:V,G),V,G))
tmp <- expand.grid(m) # all possibilities
out <- tmp[which(rowSums(tmp) == V),] # pluck those that sum to 'V'
out[sample(1:nrow(out),1),] # randomly select a column

Not sure how to do with runif

Brian Davis
  • 990
  • 5
  • 11
  • @Gregor Ack! Really good point about `combn`. As far as the increasing order goes, doesn't randomly sampling the column number take care of this? – Brian Davis Sep 28 '18 at 17:28
  • Ahh I see what you were getting at. I've edited my answer – Brian Davis Sep 28 '18 at 18:27
  • 1
    Looks much better! One other recommendation, it's a little more complicated but will improve efficiency quite a bit. Make your `m` have values `rep(1:(V - 1), G - 1)` (with only `G - 1` columns). Then, after you expand grid, you can add the last column to complete the sum `tmp = cbind(tmp, V - rowSums(tmp))`. You'll still need to pluck out the rows where the added column is less than 1. But effectively removing a dimension from the `expand.grid` result will help this scale up a little bit bigger that it otherwise would. – Gregor Thomas Sep 28 '18 at 18:36
0

I figured out what I believe to be a much simpler solution. You first generate random integers from your minimum to maximum range, count them up and then make a vector of the counts (including zeros).

Note that this solution may include zeros even if the minimum value is greater than zero.

Hope this helps future r people with this problem :)

rand.vect.with.total <- function(min, max, total) {
  # generate random numbers
  x <- sample(min:max, total, replace=TRUE)
  # count numbers
  sum.x <- table(x)
  # convert count to index position
  out = vector()
  for (i in 1:length(min:max)) {
    out[i] <- sum.x[as.character(i)]
  }
  out[is.na(out)] <- 0
  return(out)
}

rand.vect.with.total(0, 3, 5)
# [1] 3 1 1 0

rand.vect.with.total(1, 5, 10)
#[1] 4 1 3 0 2

Note, I also posted this here Generate N random integers that sum to M in R, but this answer is relevant to both questions.

Jakey
  • 160
  • 1
  • 2
  • 10