0

I would like to unit test the time writing software used at my company. In order to do this I would like to create sets of random numbers that add up to a defined value.

I want to be able to control the parameters:

  • Min and max value of the generated number
  • The n of the generated numbers
  • The sum of the generated numbers

For example, in 250 days a person worked 2000 hours. The 2000 hours have to randomly distributed over the 250 days. The maximum time time spend per day is 9 hours and the minimum amount is .25

I worked my way trough this SO question and found the method

diff(c(0, sort(runif(249)), 2000))

This results in 1 big number a 249 small numbers. That's why I would to be able to set min and max for the generated number. But I don't know where to start.

Community
  • 1
  • 1
jeroen81
  • 2,305
  • 5
  • 29
  • 41
  • 2
    `runif` has a min and max argument? `runif( 249 , 0.25 , 9 )` – Simon O'Hanlon Aug 20 '13 at 15:33
  • How could I have missed that one! – jeroen81 Aug 20 '13 at 15:35
  • You're sampling in the intersection of a hyperplane (fixed sum) and a hypercube (fixed min and max). Do you want a uniform density in this domain? – Ferdinand.kraft Aug 20 '13 at 15:39
  • The last item is asked regularly here: if you're going to put a limit on the sum of your randoms, then basically you need to restrict the possible values. E.g., if you want 100 RVs in [0,1] and the sum <=50, then you have to change the maximum allowable value for each draw based on the cumulative sum to that point. – Carl Witthoft Aug 20 '13 at 15:40
  • Yes, I would like to have uniform density in the domain. – jeroen81 Aug 20 '13 at 15:47

1 Answers1

0

You will have no problem meeting any two out of your three constraints, but all three might be a problem. As you note, the standard way to generate N random numbers that add to a sum is to generate N-1 random numbers in the range of 0..sum, sort them, and take the differences. This is basically treating your sum as a number line, choosing N-1 random points, and your numbers are the segments between the points.

But this might not be compatible with constraints on the numbers themselves. For example, what if you want 10 numbers that add to 1000, but each has to be less than 100? That won't work. Even if you have ranges that are mathematically possible, forcing compliance with all the constraints might mean sacrificing uniformity or other desirable properties.

I suspect the only way to do this is to keep the sum constraint, the N constraint, do the standard N-1, sort, and diff thing, but restrict the resolution of the individual randoms to your desired minimum (in other words, instead of 0..100, maybe generate 0..10 times 10).

Or, instead of generating N-1 uniformly random points along the line, generate a random sample of points along the line within a similar low-resolution constraint.

Lee Daniel Crocker
  • 12,927
  • 3
  • 29
  • 55