3

I am trying to create n columns of vectors of length x subject to the following criteria:

i) each i'th component of every vector (eg, x[i]) has a minimum and a maximum value. The minimums and maximums are expressed as a percentage.

ii) the sum of each column is 1.

iii) I'd like to make sure I sample the entire space evenly.

I have written the following routine, called 'gen_port' which takes two vectors which contain the lower bound and upper bound for the vector, plus the number of random vectors to generate (eg, N).

def gen_port (lower_bound, upper_bound, number):
    import random
    # Given vector description of minimum and maximum, return an array of 'number' vectors,  each of which sums to 100%
    # We generate RVs, scale them by upper and lower bounds, then normalize. 
    values = np.random.random((len(lower_bound),number))   # create big array of RVs. 
    for n in range (0,number):
        for i in range (0, len(lower_bound)):
            values[i,n] = np.float(lower_bound[i]+ values[i,n]*(upper_bound[i]-lower_bound[i]))  # scale
    return values

So, for example, if I am generating 10 columns of vectors which are described by the following vectors:

lower_bound = [0.0,0.0,0.0,0.0]
upper_bound = [0.50,0.50,0.50,0.50] 
gen_ports(lower_bound, upper_bound, 10)

[Out]
array([[ 0.15749895,  0.21279324,  0.35603417,  0.27367365],
   [ 0.2970716 ,  0.48189552,  0.04709743,  0.17393545],
   [ 0.20367186,  0.47925996,  0.21349772,  0.10357047],
   [ 0.29129967,  0.15936119,  0.26925573,  0.28008341],
   [ 0.11058273,  0.2699138 ,  0.39068379,  0.22881968],
   [ 0.21286622,  0.39058314,  0.33895212,  0.05759852],
   [ 0.18726399,  0.37648587,  0.32808714,  0.108163  ],
   [ 0.03839954,  0.24170767,  0.40299362,  0.31689917],
   [ 0.35782691,  0.31928643,  0.24712695,  0.0757597 ],
   [ 0.25595576,  0.08776559,  0.16836131,  0.48791733]])

However, I want to be able to populate the vectors if the values for lower_bound and upper_bound are not homogenous.

Eg, if

[In]:
lower_bound = [0.0,0.25,0.25,0.0]
upper_bound = [0.50,0.50,0.75,1.0] 
gen_ports(lower_bound, upper_bound, 100000)

The results do not sum to 1 (only 10 samples included below):

[Out]:
array([[ 0.16010701,  0.31426425,  0.38776233,  0.1378664 ],
   [ 0.00360632,  0.37343983,  0.57538205,  0.0475718 ],
   [ 0.28273906,  0.2228893 ,  0.1998151 ,  0.29455654],
   [ 0.06602521,  0.21386937,  0.49896407,  0.22114134],
   [ 0.17785613,  0.33885919,  0.25276605,  0.23051864],
   [ 0.07223014,  0.19988808,  0.16398971,  0.56389207],
   [ 0.14320281,  0.14400242,  0.18276333,  0.53003144],
   [ 0.04962725,  0.2578919 ,  0.19029586,  0.50218499],
   [ 0.01619681,  0.21040566,  0.30615235,  0.46724517],
   [ 0.10905285,  0.23641745,  0.40660215,  0.24792755]])

I'd like to generate the 100,000 scenarios so that the space defined by the lower and upper bounds is evenly sampled. But I'm stumped, as the current function normalizes the vectors after they've been translated by the lower and upper bounds.

So, I have this obvious first question - how to modify the routine for most cases?

In addition:

i) Is this approach correct? Eg, I am introducing any bias by this implementation?

ii) Is there a faster and/or more 'pythonic' way to do this? It takes about 15 minutes for n = 1,000,000 and x = 35

GPB
  • 2,395
  • 8
  • 26
  • 36

2 Answers2

1

If you didn't have the requirement to allow any lower/upper bound (or, if lower bound is always 0 and upper is always 1) then the answer would be well-known Dirichlet distribution

https://en.wikipedia.org/wiki/Dirichlet_distribution

there is sampling python code in the link. There is also very simple way to sample Dirichlet for the simplest case where \vec{a}=1, if you need it I'll dig it out. But bounds introduce additional problems...

UPDATE

I believe you could use rejection, sample from Dirichlet and reject anything which won't fit into interval, but I would guess efficiency would be pretty low

UPDATE II

Found link with Python Dirichlet sampling for the case when all \alpha are equal to 1

Generating N uniform random numbers that sum to M

Community
  • 1
  • 1
Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • Thanks, I will check this. As you note, I need to allow for different upper and lower bounds. But this is a useful first step. – GPB Nov 18 '15 at 11:27
-2

Unless there's a reason you absolutely need to use Monte Carlo simulation, like this is homework, the more efficient method is using a numerical optimizer, for instance:

from scipy.optimize import minimize

def find_allocations(prices):
    """Find optimal allocations for a portfolio, optimizing Sharpe ratio.

    Parameters
    ----------
        prices: DataFrame, daily prices for each stock in portfolio

    Returns
    -------
        allocs: optimal allocations, as fractions that sum to 1.0
    """
    def sharpe_ratio(allocs):
        # 1e7 is arbitrary for starting portfolio value
        port_vals = (prices / prices.ix[0]) * allocs * 1e7
        returns = port_vals.pct_change()
        avg_daily_ret = returns.means(0)
        std_daily_ret = returns.std(0)
        return -(252 ** 0.5) * avg_daily_ret / std_daily_ret

    n = prices.shape[1]
    x0 = [1.0 / n] * n
    bounds = [(0.0, 1.0)] * n
    constraints = ({'type': 'eq', 'fun': lambda x: 1.0 - np.sum(np.abs(x))})
    allocs = minimize(sharpe_ratio, x0, method = 'SLSQP', 
                      bounds = bounds, constraints = constraints)
    return allocs.x

Note this is minimizing the negative of the Sharpe ratio, so actually maximizing the Sharpe ratio, as you would want. Depending upon what you want to optimize, some objective functions (such as minimum variance constrained to return the same as equal allocations) have an analytical solution.

Adam Acosta
  • 603
  • 3
  • 6
  • No, this is not 'homework.' And I am not trying to optimize a some returns problem. I am trying to populate a surface of potential portfolios in which each asset has a unique correlation to the others. Would prefer if someone could answer my question as opposed to guess what I am trying to solve for. – GPB Nov 17 '15 at 22:02