1

Hello I have a dataset which is shown below. I am running a groupby clause across several variables. I am getting exactly the output that I need with the exception that some combinations are missing due to lack of data.

Example data:

    age   income    education   usage
0   <50   <100K     highschool  10
1   <50   >100K     college     15
2   <50   <100K     highschool  20
3   >50   >100K     college     14
4   >50   >100K     highschool  30

Example code:

grouped_obj = df.groupby(['age', 'income', 'education'])['usage'].mean()

Example output:

age  income  education 
<50  <100K   highschool    15
     >100K   college       15
>50  >100K   college       14
             highschool    30

Desired output:

age  income  education 
<50  <100K   highschool    15
             college       missing
     >100K   college       15
             highschool    missing
>50  >100K   college       14
             highschool    30
     <100K   highschool    missing
             college       missing

My actual problem involves ~2.4 million rows and 31 variables and the missing combinations of variables is random and challenging to identify.

magladde
  • 614
  • 5
  • 23
  • 1
    I think this recent question is very similar: [Adding rows based on condition in Pandas](https://stackoverflow.com/questions/73213434/adding-rows-based-on-condition-in-pandas) – fsimonjetz Aug 12 '22 at 19:17

0 Answers0