0

today I have been working on merging and editing data frames and I have been stuck with a very specific part. I have a column with names of certain fruit and then names of a person like this:

Fruit Person
Banana Jake
Banana Paul
Carrot Nancy
Carrot Sydney
Carrot Jane

Note that the "Person" column will always be unique. My goal is to get something like this with a third or fourth column:

Fruit Person1 Person2 Person3
Banana Jake Paul
Carrot Nancy Sydney Jane

I've tried this:

first = df.drop_duplicates(subset=['Zone','District','Area'],keep='First',inplace = True)
second = df.drop_duplicates(subset=['Zone','District','Area'],keep='Last',inplace = True)

and merging them together later but this will not get those that are in the middle such as "Syndey" from my example. I found this: HERE

But I don't really understand the groupby part. I hope this helps and thank you for your time and patience.

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58

2 Answers2

2

Use df.groupby, Groupby.agg and pd.concat:

In [890]: df1 = df.groupby('Fruit').agg(list).reset_index()
In [905]: res = pd.concat([df1.Fruit, pd.DataFrame(df1.Person.tolist(), index=df1.index)], 1)

In [906]: res
Out[906]: 
    Fruit      0       1     2
0  Banana   Jake    Paul  None
1  Carrot  Nancy  Sydney  Jane
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
1
df = df.assign(_=lambda x: 'Person'+(x.groupby('Fruit').transform(
    'cumcount')+1).astype(str)).pivot(columns='_', index='Fruit')
print(df)

Result:

        Person                
_      Person1 Person2 Person3
Fruit                         
Banana    Jake    Paul     NaN
Carrot   Nancy  Sydney    Jane
Алексей Р
  • 7,507
  • 2
  • 7
  • 18