Problem Description:
I'm working on making a function which gives me a definition for a particular combination of several descriptors based on a single index. My inputs are a set of raw features X = [feat0,feat1,feat2,feat3,feat4]
, a list of powers to be used pow = [1,2,3]
, and a list of group sizes sizes = [1,3,5]
. A valid output might look like the following:
feat0^2 * feat4^3 * feat1^1
This output is valid because feat0
, feat4
, and feat1
exist within X
, their powers exist within pow
, and the number of features being combined is in sizes
.
Invalid edge cases include:
- values which don't exist in
X
, powers not inpow
, and combination sizes not insizes
- combinations that are identical to another are invalid:
feat0^2 * feat1^3
andfeat1^3 * feat0^2
are the same - combinations that include multiples of the same feature are invalid:
feat0^1 * feat0^3 * feat2^2
is invalid
under the hood I'm encoding these groupings as lists of tuples. So feat0^2 * feat4^3 * feat1^1
would be represented as [(0,2), (4,3), (1,1)]
, where the first element in the tuple is the feature index, and the second is the power.
Question:
my question is, how can I create a 1 to 1 mapping of a particular combination to an index i
? I would like to get the number of possible combinations, and be able to plug in an integer i
to a function, and have that function generate a particular combination. Something like this:
X = [0.123, 0.111, 11, -5]
pow = [1,2,3]
sizes = [1,3]
#getting total number of combinations
numCombos = get_num_combos(X,pow,sizes)
#getting a random index corresponding to a grouping
i = random.randint(0, numCombos)
#getting grouping
grouping = generate_grouping(i, X, pow, sizes)
print(grouping)
Resulting in something like
[(0,1), (1,2), (3,1)]
So far, figuring out the generation when not accounting for the various edge cases wasn't too hard, but I'm at a loss for how to account for edge cases 2 and 3; making it guaranteed that no value of i
is algebraically equivalent to any other value of i
, and that the same feature does not appear multiple times in a grouping.
Current Progress
#computes the n choose k of a list and a size
def get_num_groupings(n, k):
return int(math.factorial(n)/(math.factorial(k)*math.factorial(n-k)))
import numpy as np
import bisect
i = 150
n = 5
m = 3
sizes = [1, 3, 5]
#computing the number of elements in each group length
numElements = [m**k * get_num_groupings(n, k) for k in sizes]
#index bins for each group size
bins = list(np.cumsum(numElements))[:-1]
#getting the current group size
binIdx = bisect.bisect_left(bins,i)
curSize = sizes[binIdx]
#adding idx 0 to bins
bins = [0]+bins
#getting the location of i in the bin
z = i - bins[binIdx]
#getting the product index and combination rank
pi = z // m**k
ci = z % m**k
#getting the indexes of the powers
pidx = [(pi // m**(curSize - (num+1)))%m for num in range(curSize)]
#getting the indexes of the features
#TODO cidx = unrank(i, range(n))
This is based on the Mad Physicist
's answer. Though I haven't figured out how to get cidx
yet. Some of the variable names are rewritten for my own understanding. To my knowledge this implimentation works by logically separating the combinations of variables and which powers they each have. So far, I can get the powers from an index i
, and once unrank
is ironed out I should be able to get the indexes for which features are used.