Hello I have a dataset which is shown below. I am running a groupby clause across several variables. I am getting exactly the output that I need with the exception that some combinations are missing due to lack of data.
Example data:
age income education usage
0 <50 <100K highschool 10
1 <50 >100K college 15
2 <50 <100K highschool 20
3 >50 >100K college 14
4 >50 >100K highschool 30
Example code:
grouped_obj = df.groupby(['age', 'income', 'education'])['usage'].mean()
Example output:
age income education
<50 <100K highschool 15
>100K college 15
>50 >100K college 14
highschool 30
Desired output:
age income education
<50 <100K highschool 15
college missing
>100K college 15
highschool missing
>50 >100K college 14
highschool 30
<100K highschool missing
college missing
My actual problem involves ~2.4 million rows and 31 variables and the missing combinations of variables is random and challenging to identify.