I have a dataframe with the following structure:
df= pd.DataFrame({"Origin": ['A','A','A','A','B','B','B'],"Name":["John","Bob","John","John","Bob","Elvis","John"],"Year":[2015,2015,2015,2015,2015,2016,2016]})
Origin |Name | Year
A John 2015
A Bob 2015
A John 2015
A John 2015
B Bob 2015
B Elvis 2016
B John 2016
I am trying to regroup the data as follow : for each year, I want to count the number of line per name per origin. I present it with the origin and year joint in a column name and the name in index. The expected output here should be as :
index | A_2015 | A_2016 | B_2015 | B 2016 |
--------------------------------------------
John | 3 | 0 | 0 | 1
Bob | 1 | 0 | 1 | 0
Elvis | 0 | 0 | 0 | 1
The origin can be separated in different dataframe it doesn't realy matter to me (my current code is dividing it).
My code is running but if find it non elegant as I combine a for loop and a group_by. Here is my shot, I was wondering if there was a better way to perform this transformation.
dfTotalA = pd.DataFrame()
dfTotalB = pd.DataFrame()
for Year in TotalData.Year.unique():
df = TotalData.query("(Origin == 'A') & (YearHit == @Year)")[["Origin","Name"]].groupby('Name').Origin.size().to_frame(name='A_'+str(Year))
dfTotalA= pd.concat([dfTotal,df1],sort=False,axis=1)
df2 = TotalData.query("(Origin == 'B') & (YearHit == @Year)")[["Origin","Name"]].groupby('Name').Origin.size().to_frame(name='B_'+str(Year))
dfTotalB= pd.concat([dftotalB,df2],sort=False,axis=1)
Completedf = pd.concat(dfTotalA,dftotalB],sort=False,axis=1)
Completedf.fillna(0,inplace=True)