0

Would you help me, please, to calculate all the variations of 10 factors (each has 15 values) groupped by 3.

We have 10 factors. Each factor has 15 values. E.g. 1,2,3,4,5,6...15

All the possible combinations of the first tripple of the factors (e.g. factor1, factor2, factor3) are:

15 (factor1 combination values) x 15 (factor2 combination values) x 15 (factor3 combination values) = 3 375

This should be calculated for all the possible tripplets among 10 factors :

    3 * 3 * 3 * 3 * 3 * 3 * 3 * 3 * 3 * 3 = 59 049 possible combinations of 3 factors
 (except duplicates like factor1, factor1, factor2)

As a result we have 59 049 combinations of 3 factors x 3 375 combinations of its values = 199 mln records

Desirable output:

1st place   2nd place   3rd place   1st place value 2nd place value 3rd place value
factor1     factor2     factor3     1               1               1
factor1     factor2     factor3     1               1               2
factor1     factor2     factor3     1               1               3
…           …           …           …               …               …   
factor8     factor9     factor10    15              15              15

Thank you for every prompt how to meet the goal.

Alex Ivanov
  • 657
  • 1
  • 8
  • 17
  • 1
    If your "combination" mean [this](https://en.wikipedia.org/wiki/Combination), please revise your statement using the correct mathematical expression. – Bill Huang Oct 19 '20 at 08:48
  • 1
    In addition, what are your columns and index(es)? I cannot infer from your statement. Please provide examples. – Bill Huang Oct 19 '20 at 08:51

1 Answers1

1

Key to your question: Number of combinations "except duplicates" is simply a binomial coefficient, and the instances can be generated by itertools.product() or pandas.MultiIndex.from_product() (this anwer also).

Therefore, the exact number of (factor1, factor2, factor3) is binom(10, 3) =120 instead of 3**10=59,049. The total number of rows is thus 120*3375=405,000.

Solution:

I parameterized all the numbers just to make the mathematical logic clear. In addition, this solution can be applied to varying number of values by recalculating comb_facs accordingly.

import pandas as pd
import numpy as np
import itertools
from scipy.special import comb

# data and parameters
n_cols = 10
k_cols = 3  # binomial coeff. (n k)
n_vals = 15  # 15 vals
dic = {}
for i in range(1, n_cols+1):
    dic[f"f{i}"] = np.array([j for j in range(1, 1+n_vals)], dtype=object)
df = pd.DataFrame(dic)

# preallocate the output arrays: factors and values
comb_cols = comb(n_cols, k_cols)  # binom(10,3) = 120
comb_facs = int(n_vals ** k_cols)  # NOTE: must recalculate if number of values are not constant
total_len = int(comb_cols * comb_facs)
factors = np.zeros((total_len, k_cols), dtype=object)
values = np.zeros((total_len, k_cols), dtype=int)

# the actual iteration
for i, tup in enumerate(itertools.combinations(df.columns, k_cols)):

    # 1. Cartesian product of (facA, facB, facC).
    #    can also use list(itertools.product())
    vals = pd.MultiIndex.from_product(
        [df[tup[i]].values for i in range(k_cols)]  # df.f1, df.f2, df.f3
    )
    arr_vals = pd.DataFrame(index=vals).reset_index().values
    # 2. Populate factor names and values into output arrays
    factors[i * comb_facs:(i + 1) * comb_facs, :] = tup  # broadcasting
    values[i * comb_facs:(i + 1) * comb_facs, :] = arr_vals

# result
pd.concat([pd.DataFrame(factors, columns=["1p fac", "2p fac", "3p fac"]),
           pd.DataFrame(values, columns=["1p val", "2p val", "3p val"])], axis=1)

Out[41]: 
       1p fac 2p fac 3p fac  1p val  2p val  3p val
0          f1     f2     f3       1       1       1
1          f1     f2     f3       1       1       2
2          f1     f2     f3       1       1       3
3          f1     f2     f3       1       1       4
4          f1     f2     f3       1       1       5
       ...    ...    ...     ...     ...     ...
404995     f8     f9    f10      15      15      11
404996     f8     f9    f10      15      15      12
404997     f8     f9    f10      15      15      13
404998     f8     f9    f10      15      15      14
404999     f8     f9    f10      15      15      15
[405000 rows x 6 columns]
Bill Huang
  • 4,491
  • 2
  • 13
  • 31
  • Bill, thank you so much. First, I didn't believe. I opened excel and started to write all the combinations by hands. After 80 rows I caught the logic. I was surprised, that there are special mathematical methods to calculate the number of combinations and much surprised that there is a special method in pandas to do it in a couple of rows. – Alex Ivanov Oct 19 '20 at 12:30