1

I have a data frame that contains 4 columns: id, color, flag_1 and flag_2:

df = pd.DataFrame({'id': range(0,5),
                   'color': ['red', 'red', 'blue', 'blue', 'blue'],
                   'flag_1':[1, 0, 0, 0, 0],
                  'flag_2':[1, 1, 1, 1, 0]})

Different from this question: Pandas percentage of total with groupby, i want to group by the column color and get the percentage of total of both, flag_1 and flag_2.

The result should look like this data frame:

color    flag_1  flag_2
red       0.5     1
blue       0     0.67

I can't seem to figure out how to adapt the code from the cited question that aggregates just one column, to my needs.

Lucas Reis
  • 65
  • 1
  • 2
  • 7

1 Answers1

1

Try crosstab:

m = df.drop("id", axis=1).melt("color")
pd.crosstab(m.color, m.variable, m.value, aggfunc="mean").rename_axis(None)


variable    flag_1  flag_2
blue         0.0    0.666667
red          0.5    1.000000

Sticking to groupby :

df.groupby("color", sort=False).agg(flag1=("flag_1", "mean"), flag2=("flag_2", "mean"))


       flag1    flag2
color       
red     0.5     1.000000
blue    0.0     0.666667
sammywemmy
  • 27,093
  • 4
  • 17
  • 31