The following example from the web for implementing the function expand_grid() contains three variables: height (2 categories), weight (3 categories), sex (2 categories), for a total of 2 * 3 * 2 = 12 categories.
df={'height': [60, 70],
'weight': [100, 140, 180],
'sex': ['Male', 'Female']}
Running expand_grid on the above object
expand_grid(df)
produces the following outcome:
sex weight height
0 Male 100 60
1 Male 100 70
2 Male 140 60
3 Male 140 70
4 Male 180 60
5 Male 180 70
6 Female 100 60
7 Female 100 70
8 Female 140 60
9 Female 140 70
10 Female 180 60
11 Female 180 70
I would like to do the same for a dataset with the following columns (categories):
Race (9), Marital_Status (3), Sex (2), Age (2), Hispanic (2).
That's 9*3*2*2*2 = 216 categories.
I would like something like the following:
Race Marital_Status Sex Age Hispanic
0 White Married Male Under_18 Hispanic
1 White Married Male Under_18 Non-Hispanic
2 White Married Male Over_18 Hispanic
3 White Married Male Over_18 Non-Hispanic
4 White Married Male Over_18 Hispanic
5 White Married Female Under_18 Hispanic
.
.
.
216 Asian Single Female Over_18 Non-Hispanic
When I try to run expand_grid(), the system runs out of memory.
I was told that if there is a way that Python recognizes the data type (e.g. list, vector, etc.) before hand, that would be faster and computationally less expensive. Is there a feasible way to implement this?
Thanks much!