Joining strings from different rows and several columns using ids in a Pandas DataFrame

Question

I have a table similar to this one:

Id	Name	Languages	Class
1	James	eng	B
1	James	ger	A
2	Joe	fre	A
3	Maria	spa	A
3	Maria	spa	B
4	Susan	eng	C
4	Susan	eng	C

So, as you can see:

Some rows are completely repeated (like the ones for Susan)
Some rows are unique (Joe)
Several rows repeat the ids of other rows, but show differences in some columns (James in two columns, Maria in one).

What I would like have is a table where each row represents a different id, and the different values that appear now in different rows should be concatenated in a single string with a separator character, for example semicolon. In other words, I want this:

Id	Name	Languages	Class
1	James	eng;ger	B;A
2	Joe	fre	A
3	Maria	spa	A;B
4	Susan	eng	C

The repeated rows I can drop with df.drop_duplicates(). For the rest, I know how to do it in a very inefficient way, creating a new table with the ids as index, iterating over rows and placing the information in the new table. But there has to be a better way to do this, I guess with .groupby("id") and then something else.

There is a similar question already answered here: Concatenate strings from several rows using Pandas groupby However, in this question, only on column should be joined, while in mine this is the case for several columns. In my real example, there are dozens of columns. Part of the question was how to deal with the plurality of columns to be joined. Any help, please? Thanks!

`df.groupby(['Id', 'Name'], as_index=False).agg(lambda x: ';'.join(dict.fromkeys(x)))` — mozway, Nov 08 '22 at 07:35

Joining strings from different rows and several columns using ids in a Pandas DataFrame

0 Answers0