1

I'm trying to create a list (call it: weights) of N random numbers between 0.005 and 0.045 with a total sum equal to 1. N can be any integer between 22 and 200. So the following restrictions:

  • Number of numbers in weights = N
  • For every n in weights: 0.005 < n < 0.045
  • sum of all n's in weights = 1

The first restriction is easy I think. Also, I know how to fix both the second and the third restriction separate from each other. But I don't know how to combine them into one piece of code.

  • Second restriction: 0.005 < x < 0.045:
import numpy as np
import random


weights_step1 = np.random.randint(min=5, max = 45, size = N)

weights = []
for weight in weights_step1:
  weights.append(weight/1000)

  • Third restriction

Generating a list of random numbers, summing to 1

Does anyone know how to get both restrictions into one piece of code?

Dirk Koomen
  • 125
  • 2
  • 10
  • 2
    What if N is less than 22 or larger than 200? – mkrieger1 Jan 05 '22 at 13:54
  • 3
    It looks there is a flaw in your problem, if N can be arbitrary and the weights can only be from 0.005 to 0.045, if I give N like 5 you can't satisfy your problem anyway. – Bruno Jan 05 '22 at 13:54
  • @mkrieger1 Yes, you're right. In my situation it is for a portfolio with N somewhere around 50 – Dirk Koomen Jan 05 '22 at 14:06
  • @Bruno i'll edit the post, my N is somewhere around 50 – Dirk Koomen Jan 05 '22 at 14:07
  • I'm sure you know this, but just a heads up that no (pseudo) random distribution can be constrained to a required sum. – JonSG Jan 05 '22 at 14:35
  • 1
    @JonSG Huh?! Dirichlet distribution is the perfect example of being constrained to a required sum – Severin Pappadeux Jan 05 '22 at 20:36
  • @SeverinPappadeux There is no way that the final item in the list is random if you have a target sum – JonSG Jan 05 '22 at 20:39
  • @DirkKoomen If you have say N=200, and sum=1, then mean value of each n is 1/200=0.005. But this is MEAN value, meaning some n would be below 0.005 and some above. You cannot have N=200, sum=1 and no value below 0.005. You could have N=200, sum=1 and values of n grouped close to 0.005 with relatively small variance. – Severin Pappadeux Jan 05 '22 at 20:40
  • @JonSG Sure, there is a way. In Dirichlet sample all values would be random, with constrain of sum=1. Problem is getting say N=200, and sum=1. Mean would be 0.005, but some values in the sample would be above 0.005, some would be below 0.005, you cannot have all of them >= 0.005 – Severin Pappadeux Jan 05 '22 at 20:42
  • Just a little bit of consideration: This is a kind of problem that might or might not finish, since the generated numbers might never satisfy the condition. You can have the program running and by a matter of luck get your results or it might run indefinetely. Are you aware of it? – Bruno Jan 05 '22 at 23:29

2 Answers2

2

You might want to use the Dirichlet Rescale algorithm (DRS), for which a Python implementation is available:

That gives you a proper statistical guarantee. Quoting the abstract of the math paper:

the vectors are uniformly distributed over the valid region of the domain of all possible vectors, bounded by the constraints.

Trying for 50 numbers summing to 1:

$ pip install drs
...
Installing collected packages: drs
Successfully installed drs-2.0.0
$ 
$ python3
Python 3.9.9 (main, Nov 19 2021, 00:00:00) 
...
>>>
>>> from drs import drs
>>>
>>> n = 50
>>> s = 1.0
>>>
>>> v1 = drs(n, s, n*[0.045], n*[0.005])
>>> sum(v1)
1.0000000000000004
>>> max(v1)
0.04278387127251347
>>> min(v1)
0.005035400173241331
>>> 
>>> v2 = drs(n, s, n*[0.045], n*[0.005])
>>> sum(v2)
0.9999999999999994
>>> max(v2)
0.04445793844097045
>>> min(v2)
0.005294943276519565
>>> 
jpmarinier
  • 4,427
  • 1
  • 10
  • 23
1

This might do the trick. The strategy is to uniformly partition the available space then iterate over the partitions and for each iteration take a random bit from the current one and add it to a second one.

You might find that you need to use the decimal package to get a little more precision. You will likely also want to introduce guards to ensure that the constraints on the number of partitions and their minimum and maximum sizes are not inconsistent.

import random

partitions = 45
partition_size_min = 0.005
partition_size_max = 0.045

weights = [1.0 / partitions] * partitions
for index in range(len(weights)):
    partner_index = random.randint(0, partitions-1)

    available_to_give = weights[index] - partition_size_min
    available_to_recieve = partition_size_max - weights[partner_index]
    delta = random.uniform(0, min(available_to_give, available_to_recieve))

    weights[partner_index] += delta
    weights[index] -= delta

print(f"Sum: {sum(weights)} Min:{min(weights)} Max: {max(weights)}")
print(weights)

That should give you a result like:

Sum: 1.0 Min:0.0057759435106850415 Max: 0.04428043727891049

[
    0.03408561328744241,
    0.010644787590344313,
    0.01400427089495221,
    0.01484912512559225,
    ...
    0.019186499047958494,
    0.02443794812733188,
    0.03475172101526412,
    0.020782296753987052
]

I'm sure a proper statistician would find a fault in this method, but it might get you close to what you want.

JonSG
  • 10,542
  • 2
  • 25
  • 36