0

( Beginner Python-Pandas) Hello, I have a dataframe that contains a column that tells me the name of the user (user_id) and a second column that contains the organization that is attached to it (org_name).

In the simple case each user has a unique organization attached to him, but it happens sometimes that a user has several organizations.

I would like to merge the names of the users who have more than one organization like this: "org1-org2-org3" for each row of these special users.

Currently I can find the special users with simple code like : base['user_id'].value_counts()>=2 which tells me which users are repeated in my dataframe because they have several organizations.

I would like to make a loop or a function that would reason like this: I take the line "i" -> does the user repeat? -> If not, we go to the next line

-> if yes, find all the organizations that are associated with it and merge their names and replace each organization name of this user by this merged name -> next line "i".

I do it manually by copying the result of the code I put to replace the values (example in the code I wrote I copy the user that repeats itself and insert it in a code that replaces the name of its organizations)

the problem is that it is not very efficient in term of time

First I would like to merge the organization names in my 'org_name' column

If you have any tips or resolution techniques I'm interested.

rudolfovic
  • 3,163
  • 2
  • 14
  • 38
jer_card
  • 71
  • 7
  • To improve this and future questions please include a small subset of your data as a copyable piece of code that can be used for testing as well as your expected output. For more info, see [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391). – AlexK May 22 '21 at 23:28

1 Answers1

0

Using a toy dataframe to demonstrate the example:

df = pd.DataFrame({'User': [1,2,1,3,2,4,5,1,3], 'Org': list('ABCDEFGHI')})

All you have to do is this:

df.groupby('User').agg(':'.join).reset_index()

You could optionally join back your result to the original dataframe:

result.merge(df, on='User', suffixes=('s', ''))
rudolfovic
  • 3,163
  • 2
  • 14
  • 38