How can I match the value and assign to them a new column based on other column string

Question

Dataframe:

MovieID	movieCater	rating
1	Action, Comedy, Adventure	4
2	Action, Crime	3
3	Crime	2

What I want:

MovieID	movieCater	Action	Comedy	Adventure	Crime
1	Action, Comedy, Adventure	4	4	4	0
2	Action, Crime	3	0	0	3
3	Crime	0	0	0	2

First you need to split your string to get the values from the movieCater field, then you normalize and finally pivot - you can read the Pivot documentation here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html — Fernando J. Rivera, May 22 '21 at 04:01

score 4 · Accepted Answer · answered May 22 '21 at 04:11

You can also get_dummies for the column movieCater and multiply the rating:

out = df.join(df['movieCater'].str.get_dummies(",").mul(df['rating'],axis=0))

print(out)

   MovieID               movieCater  rating  Action  Adventure  Comedy  Crime
0        1  Action,Comedy,Adventure       4       4          4       4      0
1        2             Action,Crime       3       3          0       0      3
2        3                    Crime       2       0          0       0      2

To match the expected output, drop the rating column before joining:

out = (df.drop("rating",1).join(
      df['movieCater'].str.get_dummies(",").mul(df['rating'],axis=0))

If the series is very very big, you might also consider splitting the strings with the sep="," then use this solution: https://stackoverflow.com/a/51420716/9840637 to get the dummies. Finally multiply the rating column.

hi, i updated the question and add images for question, however, I don't know how to display the description, maybe you check my question by editing — Leo, May 22 '21 at 05:36
@Leo - need `out = df.join(df['movieCater'].str.join(',').str.get_dummies(",").mul(df['rating'],axis=0))` ? — jezrael, May 22 '21 at 05:57
@anky - For second solution `df.join(df['movieCater'].str.get_dummies(",").mul(df.pop('rating'),axis=0))` should working — jezrael, May 22 '21 at 05:57

score 2 · Answer 2 · answered May 22 '21 at 04:05

Here's an approach using .pivot_table() method.

First of all, you have to convert movieCater column to a list using .apply() method.

>>> df
   MovieID                 movieCater  rating
0        1  Action, Comedy, Adventure       4
1        2              Action, Crime       3
2        3                      Crime       2
>>> df.assign(movieCater_list = df['movieCater'].apply(lambda x: x.split(', ')))
   MovieID                 movieCater  rating              movieCater_list
0        1  Action, Comedy, Adventure       4  [Action, Comedy, Adventure]
1        2              Action, Crime       3              [Action, Crime]
2        3                      Crime       2                      [Crime]

Next, use .explode() method on the movieCater_list column.

>>> df.assign(movieCater_list = df['movieCater'].apply(lambda x: x.split(', '))).explode('movieCater_list')
   MovieID                 movieCater  rating movieCater_list
0        1  Action, Comedy, Adventure       4          Action
0        1  Action, Comedy, Adventure       4          Comedy
0        1  Action, Comedy, Adventure       4       Adventure
1        2              Action, Crime       3          Action
1        2              Action, Crime       3           Crime
2        3                      Crime       2           Crime

Finally use .pivot_table()

>>> df.assign(movieCater_list = df['movieCater'].apply(lambda x: x.split(', '))).explode('movieCater_list').pivot_table(values='rating', index=['MovieID', 'movieCater'], columns='movieCater_list', fill_value=0)
movieCater_list                    Action  Adventure  Comedy  Crime
MovieID movieCater
1       Action, Comedy, Adventure       4          4       4      0
2       Action, Crime                   3          0       0      3
3       Crime                           0          0       0      2

hi, i updated the question and add images for question, however, I don't know how to display the description, maybe you check my question by editing. — Leo, May 22 '21 at 05:28
Just checked your edit. Using `pd.get_dummies()` from anky's solution is a better solution for your problem. `.pivot_table()` relies on aggregation (`mean` by default) and causes problems when there are possible duplicates in `index` parameter. — Gusti Adli, May 22 '21 at 07:07

score 1 · Answer 3 · answered May 22 '21 at 04:05

Here is a solution you can try out, first split based on delimiter then explode finally pivot_table

print(
    df.assign(movieCater=df['movieCater'].str.split(","))
        .explode(column='movieCater')
        .pivot_table(index='MovieID', columns='movieCater', values='rating', fill_value=0)
)

movieCater  Action  Adventure  Comedy  Crime
MovieID                                     
1                4          4       4      0
2                3          0       0      3
3                0          0       0      2

MiH · Answer 4 · 2021-05-22T07:30:13.700

Suppose the input dataframe is

df = pd.DataFrame({
    'MovieID': ['001','002','003'], 
    'movieCat': ['Action, Comedy, Adventure', 'Action, Crime', 'Crime'], 
    'rating': [4,3,2]
    })

#output
    MovieID movieCat                    rating
0   001     Action, Comedy, Adventure   4
1   002     Action, Crime               3
2   003     Crime                       2

I've re-use similar codes of @sushanth to get the final table

df['temp'] = df['movieCat'].str.split(", ")
df = df.explode(column='temp').pivot_table(index=['MovieID', 'movieCat'], columns='temp', values='rating', fill_value=0)
df.columns.name=None
df.reset_index(inplace=True)

#output
    MovieID movieCat                    Adventure   Comedy  Crime   Action
0   001     Action, Comedy, Adventure   4           4       0       4
1   002     Action, Crime               0           0       3       3
2   003     Crime                       0           0       2       0

After this you could export the table to Excel

df.to_excel('my_file.xlsx', index=False)

How can I match the value and assign to them a new column based on other column string

4 Answers4

Linked