Python Pandas Dataframe adding extra column based on duplicates

Question

today I have been working on merging and editing data frames and I have been stuck with a very specific part. I have a column with names of certain fruit and then names of a person like this:

Fruit	Person
Banana	Jake
Banana	Paul
Carrot	Nancy
Carrot	Sydney
Carrot	Jane

Note that the "Person" column will always be unique. My goal is to get something like this with a third or fourth column:

Fruit	Person1	Person2	Person3
Banana	Jake	Paul
Carrot	Nancy	Sydney	Jane

I've tried this:

first = df.drop_duplicates(subset=['Zone','District','Area'],keep='First',inplace = True)
second = df.drop_duplicates(subset=['Zone','District','Area'],keep='Last',inplace = True)

and merging them together later but this will not get those that are in the middle such as "Syndey" from my example. I found this: HERE

But I don't really understand the groupby part. I hope this helps and thank you for your time and patience.

Thanks for the edit @MayankPorwal I am super new to this still! — justryingouthere, Nov 22 '21 at 18:42

score 2 · Accepted Answer · answered Nov 22 '21 at 18:50

2

Use df.groupby, Groupby.agg and pd.concat:

In [890]: df1 = df.groupby('Fruit').agg(list).reset_index()
In [905]: res = pd.concat([df1.Fruit, pd.DataFrame(df1.Person.tolist(), index=df1.index)], 1)

In [906]: res
Out[906]: 
    Fruit      0       1     2
0  Banana   Jake    Paul  None
1  Carrot  Nancy  Sydney  Jane

answered Nov 22 '21 at 18:50

Mayank Porwal

33,470
8
37
58

1

Great answer worked perfectly! – justryingouthere Nov 22 '21 at 18:54

score 1 · Answer 2 · answered Nov 23 '21 at 07:21

df = df.assign(_=lambda x: 'Person'+(x.groupby('Fruit').transform(
    'cumcount')+1).astype(str)).pivot(columns='_', index='Fruit')
print(df)

Result:

        Person                
_      Person1 Person2 Person3
Fruit                         
Banana    Jake    Paul     NaN
Carrot   Nancy  Sydney    Jane

Python Pandas Dataframe adding extra column based on duplicates

2 Answers2