0

I have a table similar to this one:

Id Name Languages Class
1 James eng B
1 James ger A
2 Joe fre A
3 Maria spa A
3 Maria spa B
4 Susan eng C
4 Susan eng C

So, as you can see:

  1. Some rows are completely repeated (like the ones for Susan)
  2. Some rows are unique (Joe)
  3. Several rows repeat the ids of other rows, but show differences in some columns (James in two columns, Maria in one).

What I would like have is a table where each row represents a different id, and the different values that appear now in different rows should be concatenated in a single string with a separator character, for example semicolon. In other words, I want this:

Id Name Languages Class
1 James eng;ger B;A
2 Joe fre A
3 Maria spa A;B
4 Susan eng C

The repeated rows I can drop with df.drop_duplicates(). For the rest, I know how to do it in a very inefficient way, creating a new table with the ids as index, iterating over rows and placing the information in the new table. But there has to be a better way to do this, I guess with .groupby("id") and then something else.

There is a similar question already answered here: Concatenate strings from several rows using Pandas groupby However, in this question, only on column should be joined, while in mine this is the case for several columns. In my real example, there are dozens of columns. Part of the question was how to deal with the plurality of columns to be joined. Any help, please? Thanks!

José
  • 533
  • 1
  • 4
  • 14

0 Answers0