3

I am trying to get a "grid" of n-dimensional probability vectors---vectors in which every entry is between 0 and 1, and all entries add up to 1. I wish to have every possible vector in which coordinates can take any of a number v of evenly spaced values between 0 and 1.

In order to illustrate this, what follows is a horribly inefficient implementation, for n = 3 and v = 3:

from itertools import product
grid_redundant = product([0, .5, 1], repeat=3)
grid = [point for point in grid_redundant if sum(point)==1]

now grid contains [(0, 0, 1), (0, 0.5, 0.5), (0, 1, 0), (0.5, 0, 0.5), (0.5, 0.5, 0), (1, 0, 0)].

This "implementation" is just terrible for higher dimensional and more fine-grained grids. Is there a good way to do this, perhaps using numpy?


I could perhaps add a point on motivation: I would be perfectly happy if just sampling from a random distribution gave me sufficiently extreme points, but it does not. See this question. The "grid" I am after is not random, but systematically sweeps the simplex (the space of probability vectors.)

Schiphol
  • 1,101
  • 1
  • 9
  • 14
  • Possible duplicate of [Generating a list of random numbers, summing to 1](https://stackoverflow.com/questions/18659858/generating-a-list-of-random-numbers-summing-to-1) – yeputons May 29 '18 at 11:40
  • @yeputons, thanks for the pointer. It is not a duplicate; I have edited the question to makes this clear. – Schiphol May 29 '18 at 11:45
  • 1
    What more can you say about the predetermined probability values? Are they just [0, 1/(v-1), 2/(v-1), ..., (v-1)/(v-1)]? – Warren Weckesser May 29 '18 at 12:26
  • Yes sorry, I had evenly spaced values in mind. I have edited the question to reflect this. – Schiphol May 29 '18 at 12:31

2 Answers2

3

Here is a recursive solution. It does not use NumPy and is not super efficient either although it should be faster than the posted snippet:

import math
from itertools import permutations

def probability_grid(values, n):
    values = set(values)
    # Check if we can extend the probability distribution with zeros
    with_zero = 0. in values
    values.discard(0.)
    if not values:
        raise StopIteration
    values = list(values)
    for p in _probability_grid_rec(values, n, [], 0.):
        if with_zero:
            # Add necessary zeros
            p += (0.,) * (n - len(p))
        if len(p) == n:
            yield from set(permutations(p))  # faster: more_itertools.distinct_permutations(p)

def _probability_grid_rec(values, n, current, current_sum, eps=1e-10):
    if not values or n <= 0:
        if abs(current_sum - 1.) <= eps:
            yield tuple(current)
    else:
        value, *values = values
        inv = 1. / value
        # Skip this value
        yield from _probability_grid_rec(
            values, n, current, current_sum, eps)
        # Add copies of this value
        precision = round(-math.log10(eps))
        adds = int(round((1. - current_sum) / value, precision))
        for i in range(adds):
            current.append(value)
            current_sum += value
            n -= 1
            yield from _probability_grid_rec(
                values, n, current, current_sum, eps)
        # Remove copies of this value
        if adds > 0:
            del current[-adds:]

print(list(probability_grid([0, 0.5, 1.], 3)))

Output:

[(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0), (0.5, 0.5, 0.0), (0.0, 0.5, 0.5), (0.5, 0.0, 0.5)]

Quick comparison to posted method:

from itertools import product

def probability_grid_basic(values, n):
    grid_redundant = product(values, repeat=n)
    return [point for point in grid_redundant if sum(point)==1]

values = [0, 0.25, 1./3., .5, 1]
n = 6
%timeit list(probability_grid(values, n))
1.61 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit probability_grid_basic(values, n)
6.27 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
jdehesa
  • 58,456
  • 7
  • 77
  • 121
  • @bobrobbob Well, yes, being a recursive algorithm you don't need astronomically sized inputs to break it... I don't know what are the actual sizes that the OP is expecting, though... – jdehesa May 29 '18 at 12:54
  • yup 3000 was insane sorry. – bobrobbob May 29 '18 at 13:00
  • with more reasonable values your results are wrong. starting from n=v=6 you return only 12 results (only combinations with 0/.2/1) while the "basic" returns 252 results (combinations of 0/.2/.4/.6/.8/1) – bobrobbob May 29 '18 at 13:13
  • 1
    @bobrobbob Thanks for pointing that out, fixed it now (it was a floating point precision error). – jdehesa May 29 '18 at 13:19
0

Doing this in full generality, for high dimensional vectors, and even with the clever solution in the accepted answer, is rather unmanageable. In my own case, it pays to calculate a relevant subset of all values. For example, the following function calculates all dimension-dimensional probability vectors with only n nonzero equiprobable entries:

import itertools as it
import numpy as np

def equip_n(dimension, n):
"""
Calculate all possible <dimension>-dimensional probability vectors with n nonzero,
equiprobable entries
"""
combinations  = np.array([comb for comb in it.combinations(range(dimension), n)])
vectors = np.zeros((combinations.shape[0], dimension))
for line, comb in zip(vectors, combinations):
    line[comb] = 1/n
return vectors 

print(equip_n(6, 3))

This returns

[[ 0.3333  0.3333  0.3333  0.      0.      0.    ]
 [ 0.3333  0.3333  0.      0.3333  0.      0.    ] 
 [ 0.3333  0.3333  0.      0.      0.3333  0.    ]
 [ 0.3333  0.3333  0.      0.      0.      0.3333]
 [ 0.3333  0.      0.3333  0.3333  0.      0.    ]
 [ 0.3333  0.      0.3333  0.      0.3333  0.    ]
 [ 0.3333  0.      0.3333  0.      0.      0.3333]
 [ 0.3333  0.      0.      0.3333  0.3333  0.    ]
 [ 0.3333  0.      0.      0.3333  0.      0.3333]
 [ 0.3333  0.      0.      0.      0.3333  0.3333]
 [ 0.      0.3333  0.3333  0.3333  0.      0.    ]
 [ 0.      0.3333  0.3333  0.      0.3333  0.    ]
 [ 0.      0.3333  0.3333  0.      0.      0.3333]
 [ 0.      0.3333  0.      0.3333  0.3333  0.    ]
 [ 0.      0.3333  0.      0.3333  0.      0.3333]
 [ 0.      0.3333  0.      0.      0.3333  0.3333]
 [ 0.      0.      0.3333  0.3333  0.3333  0.    ]
 [ 0.      0.      0.3333  0.3333  0.      0.3333]
 [ 0.      0.      0.3333  0.      0.3333  0.3333]
 [ 0.      0.      0.      0.3333  0.3333  0.3333]]

This is very fast. %timeit equip_n(6, 3) returns

15.1 µs ± 74.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Schiphol
  • 1,101
  • 1
  • 9
  • 14
  • This is not a solution. First of all, if v = 3, the values to select the probabilities from are supposed to be 0, 0.5 and 1. Second, if it doesn't include every possible vectors, but only some of them. For instance [1, 0, 0, 0, 0, 0]. – gciriani Jan 15 '20 at 02:18