0

How can I sample N random values such that the following constraints are satisfied?

  1. the N values add up to 1.0
  2. none of the values is less than 0.01 (or some other threshold T << 1/N)

The following procedure was my first attempt.

def proportions(N):
    proportions = list()
    for value in sorted(numpy.random.random(N - 1) * 0.98 + 0.01):
        prop = value - sum(proportions)
        proportions.append(prop)
    prop = 1.0 - sum(proportions)
    proportions.append(prop)
    return proportions

The * 0.98 + 0.01 bit was intended to enforce the ≥ 1% constraint. This works on the margins, but doesn't work internally—if two random values have a distance of < 0.01 it is not caught/corrected. Example:

>>> numpy.random.seed(2000)                                                                                            
>>> proportions(5)                                                                                                     
[0.3397481983960182, 0.14892479749759702, 0.07456518420712799, 0.005868759570153426, 0.43089306032910335]

Any suggestions to fix this broken approach or to replace it with a better approach?

Daniel Standage
  • 8,136
  • 19
  • 69
  • 116
  • Take a look at [this](https://stackoverflow.com/q/3589214/9225671) and [this](https://stackoverflow.com/q/49666779/9225671) question; maybe they can help you – Ralf Jun 25 '19 at 18:41
  • Also: I suggest using integers or `Decimal` numbers because floats can cause the sum to be off by a very small part and those errors may be hard to detect. – Ralf Jun 25 '19 at 18:42

2 Answers2

1

You could adapt Mark Dickinson's nice solution:

import random

def proportions(n):
    dividers = sorted(random.sample(range(1, 100), n - 1))
    return [(a - b) / 100 for a, b in zip(dividers + [100], [0] + dividers)]

print(proportions(5))
# [0.13, 0.19, 0.3, 0.34, 0.04]
# or
# [0.31, 0.38, 0.12, 0.05, 0.14]
# etc

Note this assumes "none of the values is less than 0.01" is a fixed threshold

UPDATE: We can generalize if we take the reciprocal of the threshold and use that to replace the hard-coded 100 values in the proposed code.

def proportions(N, T=0.01):
    limit = int(1 / T)
    dividers = sorted(random.sample(range(1, limit), N - 1))
    return [(a - b) / limit for a, b in zip(dividers + [limit], [0] + dividers)]
Daniel Standage
  • 8,136
  • 19
  • 69
  • 116
Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
0

What about this?

  • N/2 times, choose a random number x such that 1/N+x & 1/N-x fit your constraints; add 1/N+x & 1/N-x
  • If N is odd, add 1/N
Scott Hunter
  • 48,888
  • 12
  • 60
  • 101