A grid over probability vectors

Question

I am trying to get a "grid" of n-dimensional probability vectors---vectors in which every entry is between 0 and 1, and all entries add up to 1. I wish to have every possible vector in which coordinates can take any of a number v of evenly spaced values between 0 and 1.

In order to illustrate this, what follows is a horribly inefficient implementation, for n = 3 and v = 3:

from itertools import product
grid_redundant = product([0, .5, 1], repeat=3)
grid = [point for point in grid_redundant if sum(point)==1]

now grid contains [(0, 0, 1), (0, 0.5, 0.5), (0, 1, 0), (0.5, 0, 0.5), (0.5, 0.5, 0), (1, 0, 0)].

This "implementation" is just terrible for higher dimensional and more fine-grained grids. Is there a good way to do this, perhaps using numpy?

I could perhaps add a point on motivation: I would be perfectly happy if just sampling from a random distribution gave me sufficiently extreme points, but it does not. See this question. The "grid" I am after is not random, but systematically sweeps the simplex (the space of probability vectors.)

Possible duplicate of [Generating a list of random numbers, summing to 1](https://stackoverflow.com/questions/18659858/generating-a-list-of-random-numbers-summing-to-1) — yeputons, May 29 '18 at 11:40
@yeputons, thanks for the pointer. It is not a duplicate; I have edited the question to makes this clear. — Schiphol, May 29 '18 at 11:45
What more can you say about the predetermined probability values? Are they just [0, 1/(v-1), 2/(v-1), ..., (v-1)/(v-1)]? — Warren Weckesser, May 29 '18 at 12:26
Yes sorry, I had evenly spaced values in mind. I have edited the question to reflect this. — Schiphol, May 29 '18 at 12:31

jdehesa · Accepted Answer · 2018-05-29T13:18:33.323

Here is a recursive solution. It does not use NumPy and is not super efficient either although it should be faster than the posted snippet:

import math
from itertools import permutations

def probability_grid(values, n):
    values = set(values)
    # Check if we can extend the probability distribution with zeros
    with_zero = 0. in values
    values.discard(0.)
    if not values:
        raise StopIteration
    values = list(values)
    for p in _probability_grid_rec(values, n, [], 0.):
        if with_zero:
            # Add necessary zeros
            p += (0.,) * (n - len(p))
        if len(p) == n:
            yield from set(permutations(p))  # faster: more_itertools.distinct_permutations(p)

def _probability_grid_rec(values, n, current, current_sum, eps=1e-10):
    if not values or n <= 0:
        if abs(current_sum - 1.) <= eps:
            yield tuple(current)
    else:
        value, *values = values
        inv = 1. / value
        # Skip this value
        yield from _probability_grid_rec(
            values, n, current, current_sum, eps)
        # Add copies of this value
        precision = round(-math.log10(eps))
        adds = int(round((1. - current_sum) / value, precision))
        for i in range(adds):
            current.append(value)
            current_sum += value
            n -= 1
            yield from _probability_grid_rec(
                values, n, current, current_sum, eps)
        # Remove copies of this value
        if adds > 0:
            del current[-adds:]

print(list(probability_grid([0, 0.5, 1.], 3)))

Output:

[(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0), (0.5, 0.5, 0.0), (0.0, 0.5, 0.5), (0.5, 0.0, 0.5)]

Quick comparison to posted method:

from itertools import product

def probability_grid_basic(values, n):
    grid_redundant = product(values, repeat=n)
    return [point for point in grid_redundant if sum(point)==1]

values = [0, 0.25, 1./3., .5, 1]
n = 6
%timeit list(probability_grid(values, n))
1.61 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit probability_grid_basic(values, n)
6.27 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@bobrobbob Well, yes, being a recursive algorithm you don't need astronomically sized inputs to break it... I don't know what are the actual sizes that the OP is expecting, though... — jdehesa, May 29 '18 at 12:54
with more reasonable values your results are wrong. starting from n=v=6 you return only 12 results (only combinations with 0/.2/1) while the "basic" returns 252 results (combinations of 0/.2/.4/.6/.8/1) — bobrobbob, May 29 '18 at 13:13
@bobrobbob Thanks for pointing that out, fixed it now (it was a floating point precision error). — jdehesa, May 29 '18 at 13:19

Schiphol · Answer 2 · 2018-06-05T10:26:20.183

Doing this in full generality, for high dimensional vectors, and even with the clever solution in the accepted answer, is rather unmanageable. In my own case, it pays to calculate a relevant subset of all values. For example, the following function calculates all dimension-dimensional probability vectors with only n nonzero equiprobable entries:

import itertools as it
import numpy as np

def equip_n(dimension, n):
"""
Calculate all possible <dimension>-dimensional probability vectors with n nonzero,
equiprobable entries
"""
combinations  = np.array([comb for comb in it.combinations(range(dimension), n)])
vectors = np.zeros((combinations.shape[0], dimension))
for line, comb in zip(vectors, combinations):
    line[comb] = 1/n
return vectors 

print(equip_n(6, 3))

This returns

[[ 0.3333  0.3333  0.3333  0.      0.      0.    ]
 [ 0.3333  0.3333  0.      0.3333  0.      0.    ] 
 [ 0.3333  0.3333  0.      0.      0.3333  0.    ]
 [ 0.3333  0.3333  0.      0.      0.      0.3333]
 [ 0.3333  0.      0.3333  0.3333  0.      0.    ]
 [ 0.3333  0.      0.3333  0.      0.3333  0.    ]
 [ 0.3333  0.      0.3333  0.      0.      0.3333]
 [ 0.3333  0.      0.      0.3333  0.3333  0.    ]
 [ 0.3333  0.      0.      0.3333  0.      0.3333]
 [ 0.3333  0.      0.      0.      0.3333  0.3333]
 [ 0.      0.3333  0.3333  0.3333  0.      0.    ]
 [ 0.      0.3333  0.3333  0.      0.3333  0.    ]
 [ 0.      0.3333  0.3333  0.      0.      0.3333]
 [ 0.      0.3333  0.      0.3333  0.3333  0.    ]
 [ 0.      0.3333  0.      0.3333  0.      0.3333]
 [ 0.      0.3333  0.      0.      0.3333  0.3333]
 [ 0.      0.      0.3333  0.3333  0.3333  0.    ]
 [ 0.      0.      0.3333  0.3333  0.      0.3333]
 [ 0.      0.      0.3333  0.      0.3333  0.3333]
 [ 0.      0.      0.      0.3333  0.3333  0.3333]]

This is very fast. %timeit equip_n(6, 3) returns

15.1 µs ± 74.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

This is not a solution. First of all, if v = 3, the values to select the probabilities from are supposed to be 0, 0.5 and 1. Second, if it doesn't include every possible vectors, but only some of them. For instance [1, 0, 0, 0, 0, 0]. — gciriani, Jan 15 '20 at 02:18

A grid over probability vectors

2 Answers2