Data Manipulation in Python

Asked Jun 04 '18 at 15:01

Active Jun 04 '18 at 15:23

Viewed 74 times

I am dealing with a data set which has the following fields:

ID  Person_Name Person_Country
110 Marc    CA
110 Sean    CN
111 Matt    IN
111 Rob     AU
112 Mike    US

I intend grouping the data in the following way:

ID  Person_Name Person_Country
110 Marc; Sean  CA; CN
111 Matt; Rob   IN; AU
112 Mike        US

I tried using the built-in functions like .pivot_table() and .unstack(), but they weren't helpful since I am dealing with non-numeric data.

edited Jun 04 '18 at 15:19

asked Jun 04 '18 at 15:01

Mazahir Bhagat

Small note: it is usually a bad idea to give your columns names with spaces. It makes them hard to read: is Name the third column? Oh no, it is part of the second column's name. Rather, use dots or underscores as separators. – Bram Vanroy Jun 04 '18 at 15:09
Either `df.groupby('ID').agg('; '.join)` or if you want to explicitly state the column names: `df.groupby('ID')[['Person Name', 'Person Country']].agg('; '.join)`. – ayhan Jun 04 '18 at 15:11
This example cannot take advantage of `apply` and it needs the `agg` to accomplish the desired result. – zipa Jun 04 '18 at 15:12
@BramVanroy - Thanks, implemented your advice! – Mazahir Bhagat Jun 04 '18 at 17:20
@user2285236 - I was trying this approach by referring to similar questions, but it returns the column names instead of the names concatenated together. – Mazahir Bhagat Jun 04 '18 at 17:32
Are you trying with `agg`? It works fine when I try it: https://imgur.com/a/1hExpHF – ayhan Jun 04 '18 at 17:37
@user2285236 - Yes, I am using the .agg. – Mazahir Bhagat Jun 06 '18 at 20:23

0 Answers0