I have a Dataframe like this,
d = {'state': [ "Alabama", "Alabama","Alabama", "Alabama","Alabama","Alabama","Alabama","Alabama",],
'county_name': ["Autauga","Autauga","Autauga","Baldwin","Baldwin","Baldwin","Baldwin","Barbour"],
'col3':["A","B","C","A","C","D","B","B"],
'count':[2,2,1,9,3,2,50,1],
'sum':[1,0,0,3,1,0,13,0]}
df = pd.DataFrame(data=d)
print(df)
state county_name col3 count sum
0 Alabama Autauga A 2 1
1 Alabama Autauga B 2 0
2 Alabama Autauga C 1 0
3 Alabama Baldwin A 9 3
4 Alabama Baldwin C 3 1
5 Alabama Baldwin D 2 0
6 Alabama Baldwin B 50 13
7 Alabama Barbour B 1 0
I tried to reshape the Dataframe by using unstack in python. The final output I want to have is similar to,
state county_name col3_A_count col3_B_count col3_C_count col3_D_count col3_A_sum col3_B_sum col3_C_sum col3_D_sum
Alabama Autauga 2 2 1 NA 1 0 0 NA
Alabama Baldwin 9 50 3 2 3 13 1 2
Alabama Barbour NA 1 NA NA NA 0 NA NA
I have tried to use set_index
and unstack
to solve this problem but it shows error.
location = ['state','county_name']
df = df.set_index(['col3']+location).unstack('col3')
ValueError: Index contains duplicate entries, cannot reshape
Well, I don't know why it works well here. Maybe because of small data size. But it shows that error when I applied to the original dataset. It seems you cannot use duplicated records as index. Can someone tell me how to solve this problem?