2

I have a dataframe that has the following structure:

First_Name  Last_Name  Group_Membership
Joe         Francis    A
Jane        Davis      B
Mary        Smith      A,B,C
Ian         Brown      A

I need transform the values in the cell Group_Membership into columns and get a dataframe that looks like this:

First_Name  Last_Name  A    B    C
Joe         Francis    Yes  No   No
Jane        Davis      No   Yes  No
Mary        Smith      Yes  Yes  Yes
Ian         Brown      Yes  No   No

I managed to convert the values in the column Group_Membership into list and then 'exploding' it, but then I should somehow transpose it

df.['Group_Membership'] = df.['Group_Membership'].str.split(',')
df.explode('Group_Membership')

Somehow now I should be pivoting it. Also, I'm not sure this is the best way to do it...

Your help would be greatly appreciated!

Andu Stan
  • 23
  • 4

2 Answers2

3

First create an auxiliary DataFrame (A, B and C columns):

df2 = df.Group_Membership.str.split(',').explode().to_frame().assign(val='Yes')\
    .pivot(columns='Group_Membership').droplevel(0, axis=1).fillna('No')

(print it to see the content).

Then create the result:

result = df[['First_Name', 'Last_Name']].join(df2)

The result is:

  First_Name Last_Name    A    B    C
0        Joe   Francis  Yes   No   No
1       Jane     Davis   No  Yes   No
2       Mary     Smith  Yes  Yes  Yes
3        Ian     Brown  Yes   No   No
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
2

An alternate way to do this will be to use get_dummies() as well. Then do a replace for 0 as No and 1 as Yes

Here's how to go about doing it.

df1 = df.Group_Membership.str.get_dummies(',').replace({1:'Yes',0:'No'})
df2 = df.join(df1)
print (df2)

Output of this will be:

  First_Name Last_Name Group_Membership    A    B    C
0        Joe   Francis                A  Yes   No   No
1       Jane     Davis                B   No  Yes   No
2       Mary     Smith            A,B,C  Yes  Yes  Yes
3        Ian     Brown                A  Yes   No   No

You can decide to drop Group_Membership column later OR do this:

df1 = df.Group_Membership.str.get_dummies(',').replace({1:'Yes',0:'No'})
df2 = df[['First_Name','Last_Name']].join(df1)
print (df2)

Or combine all of this into one statement like this:

df = df[['First_Name','Last_Name']].join(df.Group_Membership.str.get_dummies(',').replace({1:'Yes',0:'No'}))
print (df)

The output of this will be:

  First_Name Last_Name    A    B    C
0        Joe   Francis  Yes   No   No
1       Jane     Davis   No  Yes   No
2       Mary     Smith  Yes  Yes  Yes
3        Ian     Brown  Yes   No   No
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33