How to use groupby for multiple columns in pandas for the below shown image?

Question

This is input table in pandas:

this is an output table as shown below: dtype: int64

Dear Friends,
I am new to pandas, how to get the result is shown in the second image using pandas. I am getting output as shown below using this approach

"df.groupby(['Months', 'Status']).size()"

Months Status
Apr-20 IW 2
OW 1
Jun-20 IW 4
OW 4
May-20 IW 3
OW 2

dtype: int64
But how to convert this output as shown in the second image? It will be more helpful if someone is able to help me. Thanks in advance.

Use `df.groupby(['Status', 'Months']).size().unstack()` – jezrael Jan 03 '20 at 10:25 — jezrael, Jan 03 '20 at 10:25
@ignoring_gravity - yop, comment was changed – jezrael Jan 03 '20 at 10:29 — jezrael, Jan 03 '20 at 10:29

jezrael · Answer 1 · 2020-01-03T11:29:03.733

2

Use crosstab with margins=True parameter, then if necessary remove last Total column, change order of columns by DataFrame.reindex with ordering of original column and last convert index to column by DataFrame.reset_index and remove columns names by DataFrame.rename_axis:

df = (pd.crosstab(df['Status'], df['Months'],  margins_name='Total', margins=True)
       .iloc[:, :-1]
       .reindex(df['Months'].unique(), axis=1)
       .reset_index()
       .rename_axis(None, axis=1))
print (df)
  Status  Apr_20  May_20  Jun_20
0     IW       4       2       4
1     OW       1       2       4
2  Total       5       4       8

edited Jan 03 '20 at 11:29

answered Jan 03 '20 at 10:35

jezrael

822,522
95
1,334
1,252

How to remove the below row is having 'Total', from your solutions, I got the Total at Top too, whereas I need to remove the below Total and also I would like to add a column to the left side as shown in the second the image – Balrajswamy Jan 03 '20 at 10:45
@Balrajswamy - Add `df = df.reset_index().rename_axis(None, axis=1)` – jezrael Jan 03 '20 at 11:05
Yes, I could remove the Total at Bottom, Thanks a lot for your great help. – Balrajswamy Jan 03 '20 at 11:27
In addition to the above solutions, i want to add a column at zero index. when I tried to add a column using df.insert(0,'Column_name', 'XXX'), but it went to second column. i did not know this. pls suggest me that what is wrong in the syntax? – Balrajswamy Jan 03 '20 at 11:30
@Balrajswamy - First column in my solution `0,1,2` is called index, so if use `df.insert(0,'Column_name', 'XXX')` it add column before `Status`. If want set `index` to same values use `df.index = np.repeat('XXX', len(df))` – jezrael Jan 03 '20 at 11:32
Dear Friend, I would like to add a list to the df at '0 ' index : – Balrajswamy Jan 03 '20 at 11:45
example:
df['comp_name']=['XXX','KA','IN']
df.index = np.repeat('XXX', df['comp_name']) – Balrajswamy Jan 03 '20 at 11:45
@Balrajswamy - then use `df.index = ['XXX','KA','IN']` – jezrael Jan 03 '20 at 11:46

ignoring_gravity · Answer 2 · 2020-01-03T10:34:24.113

1

Unstack, and then transpose:

df = df.groupby(['Months', 'Status']).size().unstack().T

To get a total row:

df.sum().rename('Total').to_frame().T.append(df)

edited Jan 03 '20 at 10:34

answered Jan 03 '20 at 10:27

ignoring_gravity

6,677
4
32
65

`df.groupby(['Status', 'Months']).size().unstack()` ? – jezrael Jan 03 '20 at 10:28
Don't they want Status to be the index (btw, thanks for all your great answers!)? – ignoring_gravity Jan 03 '20 at 10:28
hmmm, why answering for 1000 times answered dupes? – jezrael Jan 03 '20 at 10:29
Sure sorry, will do so in the future – ignoring_gravity Jan 03 '20 at 10:31
Thanks a lot for your great solutions , whereas how to add a row for Total shown in the second image? – Balrajswamy Jan 03 '20 at 10:31
ooops, not dupe, reopened – jezrael Jan 03 '20 at 10:32

How to use groupby for multiple columns in pandas for the below shown image?

2 Answers2