Is there a feasible way to Implement expand_grid() in PANDAS for 200+ categories?

Question

The following example from the web for implementing the function expand_grid() contains three variables: height (2 categories), weight (3 categories), sex (2 categories), for a total of 2 * 3 * 2 = 12 categories.

df={'height': [60, 70],
'weight': [100, 140, 180],
  'sex': ['Male', 'Female']}

Running expand_grid on the above object

expand_grid(df)

produces the following outcome:

       sex  weight  height
0     Male     100      60
1     Male     100      70
2     Male     140      60
3     Male     140      70
4     Male     180      60
5     Male     180      70
6   Female     100      60
7   Female     100      70
8   Female     140      60
9   Female     140      70
10  Female     180      60
11  Female     180      70

I would like to do the same for a dataset with the following columns (categories):

Race (9), Marital_Status (3), Sex (2), Age (2), Hispanic (2).

That's 9*3*2*2*2 = 216 categories.

I would like something like the following:

Race  Marital_Status Sex Age Hispanic
0 White Married Male Under_18 Hispanic
1 White Married Male Under_18 Non-Hispanic
2 White Married Male Over_18 Hispanic
3 White Married Male Over_18 Non-Hispanic
4 White Married Male Over_18 Hispanic
5 White Married Female Under_18 Hispanic
.
.
.
216 Asian Single Female Over_18 Non-Hispanic

When I try to run expand_grid(), the system runs out of memory.

I was told that if there is a way that Python recognizes the data type (e.g. list, vector, etc.) before hand, that would be faster and computationally less expensive. Is there a feasible way to implement this?

Thanks much!

jlandercy · Answer 1 · 2019-01-29T07:33:54.240

0

PSL itertools package can do the job.

import itertools
import pandas as pd

cat = {
    'C1': ['A', 'B', 'C'],
    'C2': ['A', 'B'],
    'C3': ['A', 'B', 'C', 'D']
}

order = cat.keys()
pd.DataFrame(itertools.product(*[cat[k] for k in order]), columns=order)

It creates a DataFrame with all possibles combinations (Cartesian Product) of category modalities:

   C1 C2 C3
0   A  A  A
1   A  A  B
2   A  A  C
[...]
22  C  B  C
23  C  B  D

edited Jan 29 '19 at 07:33

answered Jan 29 '19 at 07:28

jlandercy

7,183
1
39
57

Thanks. Can it handle over 200 combinations? – Nelson Chung Jan 29 '19 at 08:09

Is there a feasible way to Implement expand_grid() in PANDAS for 200+ categories?

1 Answers1