How to index a Cartesian product

Question

Suppose that the variables x and theta can take the possible values [0, 1, 2] and [0, 1, 2, 3], respectively.

Let's say that in one realization, x = 1 and theta = 3. The natural way to represent this is by a tuple (1,3). However, I'd like to instead label the state (1,3) by a single index. A 'brute-force' method of doing this is to form the Cartesian product of all the possible ordered pairs (x,theta) and look it up:

import numpy as np
import itertools

N_x = 3
N_theta = 4

np.random.seed(seed = 1)
x = np.random.choice(range(N_x))
theta = np.random.choice(range(N_theta))

def get_box(x, N_x, theta, N_theta):
    states = list(itertools.product(range(N_x),range(N_theta)))
    inds = [i for i in range(len(states)) if states[i]==(x,theta)]
    return inds[0]

print (x, theta)
box = get_box(x, N_x, theta, N_theta)
print box

This gives (x, theta) = (1,3) and box = 7, which makes sense if we look it up in the states list:

[(0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2), (2, 3)]

However, this 'brute-force' approach seems inefficient, as it should be possible to determine the index beforehand without looking it up. Is there any general way to do this? (The number of states N_x and N_theta may vary in the actual application, and there might be more variables in the Cartesian product).

You could use hash addressing taking both components modulo a large constant and then append to a list behind each hash key in case of collisions. So e.g c2 * (x % c1) + (y % c2) would be the hash key. — Jacques de Hooge, Aug 04 '16 at 07:21

Julian Kniephoff · Accepted Answer · 2016-08-04T08:04:56.350

If you always store your states lexicographically and the possible values for x and theta are always the complete range from 0 to some maximum as your examples suggests, you can use the formula

index = x * N_theta + theta

where (x, theta) is one of your tuples.

This generalizes in the following way to higher dimensional tuples: If N is a list or tuple representing the ranges of the variables (so N[0] is the number of possible values for the first variable, etc.) and p is a tuple, you get the index into a lexicographically sorted list of all possible tuples using the following snippet:

index = 0
skip = 1
for dimension in reversed(range(len(N))):
    index += skip * p[dimension]
    skip *= N[dimension]

This might not be the most Pythonic way to do it but it shows what is going on: You think of your tuples as a hypercube where you can only go along one dimension, but if you reach the edge, your coordinate in the "next" dimension increases and your traveling coordinate resets. The reader is advised to draw some pictures. ;)

I considered this, but how would it scale up to more variables? For example, instead of just `x` and `theta`, there could also be `x_dot` and `theta_dot`. — Kurt Peek, Aug 04 '16 at 07:39

score 1 · Answer 2 · edited May 23 '17 at 12:08

I think it depends on the data you have. If they are sparse, the best solution is a dictionary. And works for any tuple's dimension.

import itertools
import random

n = 100
m = 100
l1 = [i for i in range(n)]
l2 = [i for i in range(m)]

a = {}
prod = [element for element in itertools.product(l1, l2)]
for i in prod:
    a[i] = random.randint(1, 100)

A very good source about the performance is in this discution.

score 1 · Answer 3 · answered Aug 04 '16 at 11:16

For the sake of completeness I'll include my implementation of Julian Kniephoff's solution, get_box3, with a slightly adapted version of the original implementation, get_box2:

# 'Brute-force' method
def get_box2(p, N):
    states = list(itertools.product(*[range(n) for n in N]))
    return states.index(p)

# 'Analytic' method
def get_box3(p, N):
    index = 0
    skip = 1
    for dimension in reversed(range(len(N))):
        index += skip * p[dimension]
        skip *= N[dimension]
    return index

p = (1,3,2)         # Tuple characterizing the total state of the system
N = [3,4,3]         # List of the number of possible values for each state variable

print "Brute-force method yields %s" % get_box2(p, N)
print "Analytical method yields %s" % get_box3(p, N)

Both the 'brute-force' and 'analytic' method yield the same result:

Brute-force method yields 23
Analytical method yields 23

but I expect the 'analytic' method to be faster. I've changed the representation to p and N as suggested by Julian.

How to index a Cartesian product

3 Answers3