2

I have a Dataframe like this,

d = {'state': [ "Alabama", "Alabama","Alabama", "Alabama","Alabama","Alabama","Alabama","Alabama",],
      'county_name': ["Autauga","Autauga","Autauga","Baldwin","Baldwin","Baldwin","Baldwin","Barbour"],
      'col3':["A","B","C","A","C","D","B","B"],
      'count':[2,2,1,9,3,2,50,1],
      'sum':[1,0,0,3,1,0,13,0]}
df = pd.DataFrame(data=d)
print(df)
     state county_name col3  count  sum
0  Alabama     Autauga    A      2    1
1  Alabama     Autauga    B      2    0
2  Alabama     Autauga    C      1    0
3  Alabama     Baldwin    A      9    3
4  Alabama     Baldwin    C      3    1
5  Alabama     Baldwin    D      2    0
6  Alabama     Baldwin    B     50   13
7  Alabama     Barbour    B      1    0

I tried to reshape the Dataframe by using unstack in python. The final output I want to have is similar to,

state  county_name  col3_A_count  col3_B_count  col3_C_count col3_D_count  col3_A_sum  col3_B_sum  col3_C_sum  col3_D_sum
Alabama  Autauga         2             2             1             NA           1           0           0            NA  
Alabama  Baldwin         9            50             3             2            3           13          1            2
Alabama  Barbour         NA            1            NA             NA           NA          0           NA           NA  

I have tried to use set_index and unstack to solve this problem but it shows error.

location = ['state','county_name']
df = df.set_index(['col3']+location).unstack('col3')

ValueError: Index contains duplicate entries, cannot reshape

Well, I don't know why it works well here. Maybe because of small data size. But it shows that error when I applied to the original dataset. It seems you cannot use duplicated records as index. Can someone tell me how to solve this problem?

Jiayu Zhang
  • 719
  • 7
  • 22

0 Answers0