5

I have a Pandas DataFrame where one of the columns contains boolean values. Think something like this:

   id  is_male  count
0   1     True     10
1   1    False      8
2   2     True      5
3   2    False     10
4   3     True      3
5   3    False      4

where id for examples identifies a school class, and the other columns indicate how many male and female students are in the class. Now, this same information can be presented in a much cleaner way, like:

  id  male  female
0  1    10       8
1  2     5      10
2  3     3       4

My question: is there a clean way, using Pandas, of turning the first DataFrame into the second one? I have an implementation, but is neither clean nor short. I can provide it if needed, but I didn't want to clutter my question statement.

acrendic
  • 53
  • 4

1 Answers1

6

Use pivot_table with rename:

piv = df.pivot_table(index='id', 
                     columns='is_male', 
                     values='count').rename(columns={False:'female', True:'male'})

is_male  female  male
id                   
1             8    10
2            10     5
3             4     3

To remove the column name attribute, use rename_axis:

piv = piv.rename_axis(None, axis=1)

    female  male
id              
1        8    10
2       10     5
3        4     3

If you want id back as column, use reset_index:

piv = piv.reset_index()

   id  female  male
0   1       8    10
1   2      10     5
2   3       4     3

All as a one liner:

piv = df.pivot_table(index='id', columns='is_male', values='count')\
        .rename(columns={False:'female', True:'male'})\
        .rename_axis(None, axis=1)\
        .reset_index()
Erfan
  • 40,971
  • 8
  • 66
  • 78