I am trying to create n columns of vectors of length x subject to the following criteria:
i) each i'th component of every vector (eg, x[i]) has a minimum and a maximum value. The minimums and maximums are expressed as a percentage.
ii) the sum of each column is 1.
iii) I'd like to make sure I sample the entire space evenly.
I have written the following routine, called 'gen_port' which takes two vectors which contain the lower bound and upper bound for the vector, plus the number of random vectors to generate (eg, N).
def gen_port (lower_bound, upper_bound, number):
import random
# Given vector description of minimum and maximum, return an array of 'number' vectors, each of which sums to 100%
# We generate RVs, scale them by upper and lower bounds, then normalize.
values = np.random.random((len(lower_bound),number)) # create big array of RVs.
for n in range (0,number):
for i in range (0, len(lower_bound)):
values[i,n] = np.float(lower_bound[i]+ values[i,n]*(upper_bound[i]-lower_bound[i])) # scale
return values
So, for example, if I am generating 10 columns of vectors which are described by the following vectors:
lower_bound = [0.0,0.0,0.0,0.0]
upper_bound = [0.50,0.50,0.50,0.50]
gen_ports(lower_bound, upper_bound, 10)
[Out]
array([[ 0.15749895, 0.21279324, 0.35603417, 0.27367365],
[ 0.2970716 , 0.48189552, 0.04709743, 0.17393545],
[ 0.20367186, 0.47925996, 0.21349772, 0.10357047],
[ 0.29129967, 0.15936119, 0.26925573, 0.28008341],
[ 0.11058273, 0.2699138 , 0.39068379, 0.22881968],
[ 0.21286622, 0.39058314, 0.33895212, 0.05759852],
[ 0.18726399, 0.37648587, 0.32808714, 0.108163 ],
[ 0.03839954, 0.24170767, 0.40299362, 0.31689917],
[ 0.35782691, 0.31928643, 0.24712695, 0.0757597 ],
[ 0.25595576, 0.08776559, 0.16836131, 0.48791733]])
However, I want to be able to populate the vectors if the values for lower_bound and upper_bound are not homogenous.
Eg, if
[In]:
lower_bound = [0.0,0.25,0.25,0.0]
upper_bound = [0.50,0.50,0.75,1.0]
gen_ports(lower_bound, upper_bound, 100000)
The results do not sum to 1 (only 10 samples included below):
[Out]:
array([[ 0.16010701, 0.31426425, 0.38776233, 0.1378664 ],
[ 0.00360632, 0.37343983, 0.57538205, 0.0475718 ],
[ 0.28273906, 0.2228893 , 0.1998151 , 0.29455654],
[ 0.06602521, 0.21386937, 0.49896407, 0.22114134],
[ 0.17785613, 0.33885919, 0.25276605, 0.23051864],
[ 0.07223014, 0.19988808, 0.16398971, 0.56389207],
[ 0.14320281, 0.14400242, 0.18276333, 0.53003144],
[ 0.04962725, 0.2578919 , 0.19029586, 0.50218499],
[ 0.01619681, 0.21040566, 0.30615235, 0.46724517],
[ 0.10905285, 0.23641745, 0.40660215, 0.24792755]])
I'd like to generate the 100,000 scenarios so that the space defined by the lower and upper bounds is evenly sampled. But I'm stumped, as the current function normalizes the vectors after they've been translated by the lower and upper bounds.
So, I have this obvious first question - how to modify the routine for most cases?
In addition:
i) Is this approach correct? Eg, I am introducing any bias by this implementation?
ii) Is there a faster and/or more 'pythonic' way to do this? It takes about 15 minutes for n = 1,000,000 and x = 35