How can I take a 1D numpy array of length N, and replace the first 5 elements with values less than 0.25 which sum to 1?

Question

I'm working on a portfolio optimisation problem including cardinality and max weighting constraints, using a genetic algorithm to generate solutions. I want to create arrays for the initial population that meet a cardinality constraint of 5 (5 non zero elements total) and each of those numbers is less than 0.25, and those 5 elements sum to 1.

population_size = 100
cardinality_constraint = 5
# 50 is the length of the vector of assets that we can allocate to

initial_pop = np.zeros((population_size, 50)))

for i in range(population_size):
    initial_pop[i,:cardinality_constraint] = np.random.rand(cardinality_constraint)
    initial_pop[i] = initial_pop[i] / np.sum(initial_pop[i])
    np.random.shuffle(initial_pop[i])

This creates an array that satisfies the cardinality constraint, putting values into the first 5 elements, normalising them, and then shuffling them.

I've been trying to think of a way to incorpate the max weighting constraint. I think all I need to do is find a way to randomly generate 5 numbers that are each less then 0.25, and together sum to 1. I'm strugging with that though, because I can easily create 5 random numbers in that range, but if I then normalise some of the values might then exceed the maximum.

Any ideas would be most appreciated thanks!

FOLLOW UP:

I think I have found a solution that I've written up which uses the numpy dirichlet function.

So as the example I originally described, where I want to have a cardinality constraint of 5 and a maximum individual asset weight of 0.25, or 25% of the portfolio:


initial_population_list = []

while len(initial_population_list) < 100:
    x = np.random.dirichlet(np.ones(5), size = 1)
    if len(x[x > 0.25]) < 1:
        initial_population_list.append(x)

Seems to me to work well! Thanks for the suggestion to look at that function.

Are the numbers strictly positive or it is enough for them to be != 0.0? — norok2, Aug 18 '22 at 08:18
You can use the Dirichlet Theorem; sort of duplicate: https://stackoverflow.com/questions/18659858/generating-a-list-of-random-numbers-summing-to-1 — Tempman383838, Aug 18 '22 at 09:14
Sorry yes all of the values should be in the range 0 < x <= 0.25. — MrAce, Aug 18 '22 at 13:56
Thank you for the recommendation for the Dirichlet function. I've read up on that and trying to wrap my head around it as best as possible, the function np.random.dirichlet() has only the 2 parameters, alpha and size, but I don't understand what sort of input would allow me to bound the generated values, at something like 0.25 for a vector of length 5. I'll keep looking at it. — MrAce, Aug 18 '22 at 14:19
@Tempman383838 The Dirichlet distribution will satisfy the constraint on the sum, BUT it will not be of help to limit each of them to a given maximum. — norok2, Aug 18 '22 at 15:45
What I ended up doing is as above, with a conditional loop to take samples with the function and only append them when they satisfy the maximum constraint - which for those particular parameters, 5 numbers, max 0.25 occurs around 0.3% of the time. I'm no expert on the underlying theory and there might be some major non-randomness in the way I'm doing it. But my main goal is to create a set of feasible solutions to get the algorithm started and this seems to be working, though I'm sure there's a much purer way to implement what I'm going for. — MrAce, Aug 18 '22 at 16:41

score 0 · Answer 1 · answered Aug 18 '22 at 15:43

I'm not aware of anything in Numpy that would help with this, but you could solve this relatively easily using the standard library:

from random import uniform, shuffle

def constrained_simplex(count, *, max_val):
    remain = 1
    result = []
    for i in range(count - 1, 0, -1):
        val = uniform(
            max(0, remain - i * max_val),
            min(max_val, remain),
        )
        result.append(val)
        remain -= val
    assert 0 <= remain <= max_val
    result.append(remain)
    # shuffle(result)
    return result

vals = constrained_simplex(5, max_val=0.25)

which would give you something like: 0.022 0.245 0.241 0.244 0.248 back.

A problem with this is that earlier elements are more likely to be smaller than later elements. Uncommenting the shuffle(result) line would perturb the index locations appropriately. Not sure if this is an issue for your application!

score 0 · Answer 2 · answered Aug 18 '22 at 16:38

I assume you do not have a problem using the numbers once you have generated the correct one.

I assume I can focus on the problem of generation.

I also assume you are interested in generating strictly positive numbers.

Let's start with a function to compute if a certain array satisfy the constraints:

all values below max_
all values below min_
the sum of all values close to total (just close and not equal because of floating point arithmetic issues).

import numpy as np


def accept(arr, total, max_, min_):
    return np.isclose(np.sum(arr), total) and np.all(arr < max_) and np.all(arr > min_)

Now, we can produce the desired array iteratively with the following approach:

make sure that a solution exists
generate a tentative solution
while the solution cannot be accepted do
- regenerate values above
- regenerate values below
- normalize to the total divided by the sum

The way we generate the tentative solution determines the distribution and the probability of termination of the algorithm (which may be 0 -- infinite looping).

Here I provide both a uniform and a normal parent distribution, but any could work.

def gen_uniform(n, max_, min_, mean_, seed=None):
    if seed is not None:
        np.random.seed(seed)
    return np.random.random(n) * (max_ - min_) - min_


def gen_normal(n, max_, min_, mean_, seed=None):
    if seed is not None:
        np.random.seed(seed)
    scale = min(max_ - mean_, mean_ - min_)
    return np.random.normal(mean_, scale, n)

def rand_cap(
    n,
    total=1.0,
    max_=None,
    min_=None,
    gen_rand=gen_normal,
    seed=None
):
    if max_ is None:
        max_ = total / (n - 1)
    if min_ is None:
        min_ = total / (n + 1)
    mean_ = total / n
    if min_ < mean_ < max_:
        result = gen_rand(n, max_, min_, mean_, seed)
        result *= total / np.sum(result)
        while not accept(result, total, max_, min_):
            # re-draw above
            idx = np.nonzero(result >= max_)[0]
            result[idx] = gen_rand(len(idx), max_, min_, mean_, seed)
            # re-draw below
            idx = np.nonzero(result <= min_)[0]
            result[idx] = gen_rand(len(idx), max_, min_, mean_, seed)
            # fix sum
            result *= total / np.sum(result)
        return result
    else:
        raise ValueError(f"Cannot generate n={n} numbers with total={total} in the ({min_}, {max_}) range")

For the numbers provided by the question one would get:

print(rand_cap(5, 1.0, 0.25, 0.0, gen_rand=gen_uniform))
# [0.21305399 0.19891666 0.13560612 0.23090797 0.22151526]
print(rand_cap(5, 1.0, 0.25, 0.0, gen_rand=gen_normal))
# [0.24493421 0.1583963  0.18355129 0.19221243 0.22090577]

As you can see, the normal distribution produces higher variation.

On larger numbers, it is easy to see the distribution shape in the histogram:

import pandas as pd


n = 100000
total = 700.0
mu = total / n
pd.DataFrame(rand_cap(n, total, gen_rand=gen_uniform)).plot.hist(bins=16)
pd.DataFrame(rand_cap(n, total, gen_rand=gen_normal)).plot.hist(bins=16)

How can I take a 1D numpy array of length N, and replace the first 5 elements with values less than 0.25 which sum to 1?

2 Answers2