Remove rows with duplicate string values in one column, and append strings from another columns

Question

I am looking to remove duplicate rows based on the values in a column ("Name"), but append the corresponding string values in another column("Occupation").

Duplicate entry is "Jack"

I have a dataframe:

Name	center	Occupation
Jack	Miami	Clerk
Alice	Tx	Manager
Jack	San Jose	PO
Cathy	Houston	Security

And i am expecting this

Name	center	Occupation
Jack	Miami	Clerk PO
Alice	Tx	Manager
Cathy	Houston	Security

Appreciate any answer on this

so for "Occupation" you would want to join the values, what about for "center"? Do you just want to keep the first match per name? Or if not, what type of aggregation you'd want there? — sophocles, Nov 18 '22 at 14:51

score 1 · Accepted Answer · answered Nov 18 '22 at 14:57

1

Seems like you're looking for something along the lines of:

df.groupby('Name',as_index=False).agg(
    {'center':'first','Occupation':lambda x: ' '.join(x)}
    )

which get's you:

    Name   center Occupation
0  Alice       Tx    Manager
1  Cathy  Houston   Security
2   Jack    Miami   Clerk PO

However, I can't be sure what type of aggregation you'd want for 'center' - using first now which get's your desired outcome.

answered Nov 18 '22 at 14:57

sophocles

13,593
3
14
33

Thanks for your answer, but i would like to keep the same order in the column 'Name' --Jack, Alice, cathy – Nov 18 '22 at 15:14
Please then include ```df.groupby('Name',sort=False)``` in the ```groupby``` clause – sophocles Nov 18 '22 at 15:21

Rabinzel · Answer 2 · 2022-11-18T15:17:24.267

0

You can do a groupby, and then aggregate the columns as you like. For occupation you already wrote to join, for center I chose first.

out = df.groupby('Name', as_index=False, sort=False).agg({'center': 'first', 'Occupation': ' '.join})
print(out)

Output:

    Name   center Occupation
0   Jack    Miami   Clerk PO
1  Alice       Tx    Manager
2  Cathy  Houston   Security

edited Nov 18 '22 at 15:17

answered Nov 18 '22 at 14:58

Rabinzel

7,757
3
10
30

Thanks for your answer, but i would like to keep the same order in the column 'Name' --Jack, Alice, cathy – Nov 18 '22 at 15:15

Remove rows with duplicate string values in one column, and append strings from another columns

2 Answers2