Defining a List of Strings, be able to find the Ranking and Unranking of Combinations with replacement

Question

Goal

The goal is to produce an encoding of lexicographical combinations with replacements from a list of Strings. Without having to produce the entire Set of subsets. To work for a large k (combination sub size) size.

In this way, for any given combination we can produce an index of combination size. With that known index and size, we can also then produce the combination back out.

Preferred k = 10; But a k = 20 would be awesome, but I definitely could settle for max k = 6;

Desired Outcome

///items : Are the List of Strings  
///combSize : The desired/known size of the combination  
///rank : The combinations index in the lexicographical ordering  
`List<String> unRankCombinations(List<String> items, int combSize, int rank)`

///items : Are the List of Strings  
///combination : The desired combination to find the index for  
`int rankOfCombination(List<String> items, List<String> combination)`

///items : An example of a possible list of Strings to encode/decode  
`List<String> someItems = ['today', 'tomorrow', 'some-days', 'never_days', 'always', 'somewhere', 'some place', 'nowhere'];`

Examples:

`unRankCombinations(someItems, 3, z)` returns `['today', 'tomorrow', 'some-days']`  
then  
`rankOfCombination(someItems, ['today', 'tomorrow', 'some-days'])` returns `z`  

`unRankCombinations(someItems, 7, x)` returns `['today', 'today', 'today', 'never_days', 'always', 'somewhere', 'some place']`  
then  
`rankOfCombination(someItems, ['today', 'today', 'today', 'never_days', 'always', 'somewhere', 'some place'])` returns `x`

Where z and x are the index of the combination in the lex
Of course the lists need to be lexicographically sorted for it to work, this is just an example

Comments

I have desperately been trying to solve this problem for three weeks and can't get anywhere with it. I am just not getting the idea in my head and I can't seem to put the pieces together to create the functions.

Based on my readings from the below references:

Combinations are easy to rank due to there being a bijection. Due to the lexicographical ordering, the combinations can represent a Macaulay representation of n. And that every N can be written in exactly one way as a sum of k binomial coefficients of the given form.

Partial solution

Python Library : More Itertools

Has an unrank function nth_combination(iterable, r, index) which seems works but is not useful without the second function to find the inverse. Must also be converted to support List in Dart.

This is a Python function takes three arguments: an iterable object (such as a list or tuple), an integer r representing the length of the subsequences to be selected from the iterable, and an integer index representing the lexicographic index of the desired subsequence.

The function returns a tuple containing the subsequence of length r at the index position in the set of all possible subsequences of length r of the input iterable.

The function first converts the iterable to a tuple and determines its length n. If r is less than zero or greater than n, a ValueError is raised.

Next, the function calculates the total number of possible combinations of r elements from the iterable using the formula for combinations: n choose r. This is done using a loop that calculates the product of a sequence of decreasing fractions, and is equivalent to the binomial coefficient formula:

c = (n choose k) = n! / (k! * (n - k)!)

The function then adjusts the given index to handle negative indices and to ensure that it is within the valid range of indices. If the index is invalid, an IndexError is raised.

Finally, the function uses a loop to construct the desired subsequence of length r by repeatedly selecting the next element in the subsequence according to its position in the lexicographic ordering. This is done using a modified version of the algorithm for computing combinations by index, which efficiently skips over the unneeded combinations that precede the desired one.

The loop iteratively calculates the number of combinations that start with each remaining element of the iterable, and selects the one that contains the desired index. At each step, the remaining length of the subsequence and the remaining length of the iterable decrease, and the index is updated to account for the skipped combinations.

The function returns the resulting subsequence as a tuple.

Other non-full explanations

Compute rank of a combination?

Ranking and unranking combinations with replacement :: StackExchange

No answer, except a comment that says:

"apply the stars and bars bijection which told you there are (n+r−1n) outcomes to begin with, and use it to transform the problem into one of ranking n-element subsets of [n+r−1]"

The stars and bars bijection is a way to count the number of ways to distribute r indistinguishable balls into n distinguishable bins, with at least one ball in each bin. The number of ways to do this is equal to the number of ways to arrange n-1 bars and r balls in a line, which is (n+r-1 choose n-1). Each possible arrangement of bars and balls corresponds to a unique way of distributing the balls into the bins.

To use this bijection to rank combinations with replacement, we can imagine that the items in the set [n] are represented by n bins, and we are distributing k balls into these bins, with the possibility of multiple balls in each bin. We can transform this into a problem of distributing k+r-1 indistinguishable balls into n distinguishable bins, with at least one ball in each bin. The number of ways to do this is (n+k-1 choose n-1), which is the total number of combinations with replacement.

We can then use this total count to assign a rank to each combination with replacement by imagining that we are ranking n-element subsets of [n+k-1]. The lexicographic ordering described earlier can be used to rank these subsets, and the formula for the rank of a subset can be derived using the stars and bars bijection.

Ranking and Unranking of Combinations and Permutations

Explains Rank using the formula:

{\mathrm{rank}}{\text{lex}}(S) = \sum{i=0}^{k-1} \sum_{v=S[i-1]+1}^{S[i]-1}\binom{n-(v+1)}{k-(i+1)}

The lexicographic rank of a k-subset S is the position of S in the list of all k-subsets of the n-element set, when the k-subsets are ordered lexicographically. The lexicographic order of k-subsets is the same as the dictionary order of the corresponding k-tuples of their elements.

The formula calculates the lexicographic rank of S by computing the sum of binomial coefficients over certain ranges of values of the set S.

The outer sum is over i from 0 to k-1. For each i, the inner sum is over all values v that lie strictly between S[i-1]+1 and S[i]-1, where S[-1]=0 and S[k]=n+1. In other words, v is an element of the i-th "gap" between consecutive elements of S.

The expression inside the inner sum is a binomial coefficient that counts the number of ways to choose k-(i+1) elements from the set {v+1, v+2, ..., n}. This counts the number of ways to extend the first i elements of S to a full k-subset of {1, 2, ..., n} that comes lexocographically before S.

By summing over all possible choices of these k-(i+1) elements for each gap, we count the total number of k-subsets that come lexicographically before S. We then add 1 to this count to obtain the lexicographic rank of S.

Therefore, the formula provides an efficient way to compute the lexicographic rank of a k-subset S without having to enumerate all the k-subsets that come before it in the lexicographic order.

Algorithm for combination index in array

This stackoverflow answer references Wikipedia and the National Lottery Example.

The answer gives the following:

```java
int f(int n, int k, int[] vars) {
    int res = binom(n, k) - 1;

    for(int i = 0; i < k; i++) {
        res -= binom(n - vars[i], k - i);
    }

    return res;
}
```

assuming a method int binom(int n, int k) and binom(n, k) = 0 for n < k.

The purpose of the function is to compute a combinatorial expression involving binomial coefficients. Specifically, the function calculates the value of:

C(n,k) - C(n-vars[0], k-1) - C(n-vars[1], k-2) - ... - C(n-vars[k-1], 1)

where C(n,k) denotes the binomial coefficient "n choose k", which is the number of ways to choose k elements from a set of n distinct elements.

The function first initializes a variable res to the value of C(n,k) - 1, where C(n,k) is the binomial coefficient "n choose k". This value is then updated in a loop that iterates over the array vars. For each element vars[i] of the array, the function subtracts the value of the corresponding binomial coefficient C(n-vars[i], k-i) from res. The loop iterates k times, so at the end of the loop, res contains the value of the combinatorial expression described above.

Articles of Interest

Combinatorial Number System: Wikipedia

In mathematics, and in particular in combinatorics, the combinatorial number system of degree k (for some positive integer k), also referred to as combinadics, or the Macaulay representation of an integer, is a correspondence between natural numbers (taken to include 0) N and k-combinations. The combinations are represented as strictly decreasing sequences ck > ... > c2 > c1 ≥ 0 where each ci corresponds to the index of a chosen element in a given k-combination. Distinct numbers correspond to distinct k-combinations, and produce them in lexicographic order. The numbers less than ( n k ) {\tbinom {n}{k}} correspond to all k-combinations of {0, 1, ..., n − 1}. The correspondence does not depend on the size n of the set that the k-combinations are taken from, so it can be interpreted as a map from N to the k-combinations taken from N; in this view the correspondence is a bijection.

Place of a combination in the ordering:

The number associated in the combinatorial number system of degree k to a k-combination C is the number of k-combinations strictly less than C in the given ordering. This number can be computed from C = {ck, ..., c2, c1} with ck > ... > c2 > c1 as follows.

From the definition of the ordering it follows that for each k-combination S strictly less than C, there is a unique index i such that ci is absent from S, while ck, ..., ci+1 are present in S, and no other value larger than ci is. One can therefore group those k-combinations S according to the possible values 1, 2, ..., k of i, and count each group separately. For a given value of i one must include ck, ..., ci+1 in S, and the remaining i elements of S must be chosen from the ci non-negative integers strictly less than ci; moreover any such choice will result in a k-combinations S strictly less than C. The number of possible choices is ( c i i ) {\tbinom {c_{i}}{i}}, which is therefore the number of combinations in group i; the total number of k-combinations strictly less than C then is

( c 1 1 ) + ( c 2 2 ) + ⋯ + ( c k k ) , {\binom {c_{1}}{1}}+{\binom {c_{2}}{2}}+\cdots +{\binom {c_{k}}{k}},

and this is the index (starting from 0) of C in the ordered list of k-combinations.

Obviously there is for every N ∈ N exactly one k-combination at index N in the list (supposing k ≥ 1, since the list is then infinite), so the above argument proves that every N can be written in exactly one way as a sum of k binomial coefficients of the given form.

Find rank of lottery combinations