Group by in for loop

Question

I have a dataframe with the following structure:

df= pd.DataFrame({"Origin": ['A','A','A','A','B','B','B'],"Name":["John","Bob","John","John","Bob","Elvis","John"],"Year":[2015,2015,2015,2015,2015,2016,2016]})

Origin |Name | Year
    A   John    2015
    A   Bob     2015
    A   John    2015
    A   John    2015
    B   Bob     2015
    B   Elvis   2016
    B   John    2016

I am trying to regroup the data as follow : for each year, I want to count the number of line per name per origin. I present it with the origin and year joint in a column name and the name in index. The expected output here should be as :

index  | A_2015 | A_2016 | B_2015 | B 2016 |
--------------------------------------------
John   | 3      | 0      | 0      |  1
Bob    | 1      | 0      | 1      |  0
Elvis  | 0      | 0      | 0      |  1

The origin can be separated in different dataframe it doesn't realy matter to me (my current code is dividing it).
My code is running but if find it non elegant as I combine a for loop and a group_by. Here is my shot, I was wondering if there was a better way to perform this transformation.

dfTotalA = pd.DataFrame()
dfTotalB = pd.DataFrame()
for Year in TotalData.Year.unique():
    df = TotalData.query("(Origin == 'A') & (YearHit == @Year)")[["Origin","Name"]].groupby('Name').Origin.size().to_frame(name='A_'+str(Year))
    dfTotalA= pd.concat([dfTotal,df1],sort=False,axis=1)
    df2 = TotalData.query("(Origin == 'B') & (YearHit == @Year)")[["Origin","Name"]].groupby('Name').Origin.size().to_frame(name='B_'+str(Year))
    dfTotalB= pd.concat([dftotalB,df2],sort=False,axis=1)
Completedf = pd.concat(dfTotalA,dftotalB],sort=False,axis=1)
Completedf.fillna(0,inplace=True)

WenNYoBen, could you provide a solution to this question with one of the answers in your linked dupe? — Erfan, Jul 31 '19 at 17:07
@Erfan yes flatten I will edit the how to pivot tag show that process — BENY, Jul 31 '19 at 17:50

score 1 · Accepted Answer · answered Jul 31 '19 at 16:49

1

One way to do so:

new_df = (df.groupby(['Origin', 'Year'])
   .Name
   .value_counts()
   .unstack(['Origin'], fill_value=0)
   .unstack('Year', fill_value=0)
)

which gives:

Origin    A         B     
Year   2015 2016 2015 2016
Name                      
Bob       1    0    1    0
Elvis     0    0    0    1
John      3    0    0    1

And then you can rename the columns with:

new_df.columns = [f'{a}_{b}' for a,b in new_df.columns]

answered Jul 31 '19 at 16:49

Quang Hoang

146,074
10
56
74

This answer is 3 time faster than @Erfan's answer so I accepted this one – Mayeul sgc Jul 31 '19 at 16:59

score 1 · Answer 2 · answered Jul 31 '19 at 16:52

We can create a count column with GroupBy.transform. Then pivot_table. And finally flatten our MultiIndex columns:

df['cnt'] = df.groupby(['Origin', 'Name'])['Year'].transform('count')

piv = df.pivot_table(index='Name', columns=['Origin', 'Year'], fill_value=0)
piv.columns = [f'{c[1]}_{c[2]}'for c in piv.columns]

Output

       A_2015  B_2015  B_2016
Name                         
Bob         1       1       0
Elvis       0       0       1
John        3       0       1

Group by in for loop

2 Answers2