2

Dataframe:

MovieID movieCater rating
1 Action, Comedy, Adventure 4
2 Action, Crime 3
3 Crime 2

What I want:

MovieID movieCater Action Comedy Adventure Crime
1 Action, Comedy, Adventure 4 4 4 0
2 Action, Crime 3 0 0 3
3 Crime 0 0 0 2
Leo
  • 71
  • 7
  • First you need to split your string to get the values from the movieCater field, then you normalize and finally pivot - you can read the Pivot documentation here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html – Fernando J. Rivera May 22 '21 at 04:01

4 Answers4

4

You can also get_dummies for the column movieCater and multiply the rating:

out = df.join(df['movieCater'].str.get_dummies(",").mul(df['rating'],axis=0))

print(out)

   MovieID               movieCater  rating  Action  Adventure  Comedy  Crime
0        1  Action,Comedy,Adventure       4       4          4       4      0
1        2             Action,Crime       3       3          0       0      3
2        3                    Crime       2       0          0       0      2

To match the expected output, drop the rating column before joining:

out = (df.drop("rating",1).join(
      df['movieCater'].str.get_dummies(",").mul(df['rating'],axis=0))

If the series is very very big, you might also consider splitting the strings with the sep="," then use this solution: https://stackoverflow.com/a/51420716/9840637 to get the dummies. Finally multiply the rating column.

anky
  • 74,114
  • 11
  • 41
  • 70
  • hi, i updated the question and add images for question, however, I don't know how to display the description, maybe you check my question by editing – Leo May 22 '21 at 05:36
  • 1
    @Leo - need `out = df.join(df['movieCater'].str.join(',').str.get_dummies(",").mul(df['rating'],axis=0))` ? – jezrael May 22 '21 at 05:57
  • 1
    @anky - For second solution `df.join(df['movieCater'].str.get_dummies(",").mul(df.pop('rating'),axis=0))` should working – jezrael May 22 '21 at 05:57
2

Here's an approach using .pivot_table() method.

First of all, you have to convert movieCater column to a list using .apply() method.

>>> df
   MovieID                 movieCater  rating
0        1  Action, Comedy, Adventure       4
1        2              Action, Crime       3
2        3                      Crime       2
>>> df.assign(movieCater_list = df['movieCater'].apply(lambda x: x.split(', ')))
   MovieID                 movieCater  rating              movieCater_list
0        1  Action, Comedy, Adventure       4  [Action, Comedy, Adventure]
1        2              Action, Crime       3              [Action, Crime]
2        3                      Crime       2                      [Crime]

Next, use .explode() method on the movieCater_list column.

>>> df.assign(movieCater_list = df['movieCater'].apply(lambda x: x.split(', '))).explode('movieCater_list')
   MovieID                 movieCater  rating movieCater_list
0        1  Action, Comedy, Adventure       4          Action
0        1  Action, Comedy, Adventure       4          Comedy
0        1  Action, Comedy, Adventure       4       Adventure
1        2              Action, Crime       3          Action
1        2              Action, Crime       3           Crime
2        3                      Crime       2           Crime

Finally use .pivot_table()

>>> df.assign(movieCater_list = df['movieCater'].apply(lambda x: x.split(', '))).explode('movieCater_list').pivot_table(values='rating', index=['MovieID', 'movieCater'], columns='movieCater_list', fill_value=0)
movieCater_list                    Action  Adventure  Comedy  Crime
MovieID movieCater
1       Action, Comedy, Adventure       4          4       4      0
2       Action, Crime                   3          0       0      3
3       Crime                           0          0       0      2
Gusti Adli
  • 1,225
  • 4
  • 13
  • hi, i updated the question and add images for question, however, I don't know how to display the description, maybe you check my question by editing. – Leo May 22 '21 at 05:28
  • Just checked your edit. Using `pd.get_dummies()` from anky's solution is a better solution for your problem. `.pivot_table()` relies on aggregation (`mean` by default) and causes problems when there are possible duplicates in `index` parameter. – Gusti Adli May 22 '21 at 07:07
1

Here is a solution you can try out, first split based on delimiter then explode finally pivot_table

print(
    df.assign(movieCater=df['movieCater'].str.split(","))
        .explode(column='movieCater')
        .pivot_table(index='MovieID', columns='movieCater', values='rating', fill_value=0)
)

movieCater  Action  Adventure  Comedy  Crime
MovieID                                     
1                4          4       4      0
2                3          0       0      3
3                0          0       0      2
sushanth
  • 8,275
  • 3
  • 17
  • 28
0

Suppose the input dataframe is

df = pd.DataFrame({
    'MovieID': ['001','002','003'], 
    'movieCat': ['Action, Comedy, Adventure', 'Action, Crime', 'Crime'], 
    'rating': [4,3,2]
    })

#output
    MovieID movieCat                    rating
0   001     Action, Comedy, Adventure   4
1   002     Action, Crime               3
2   003     Crime                       2

I've re-use similar codes of @sushanth to get the final table

df['temp'] = df['movieCat'].str.split(", ")
df = df.explode(column='temp').pivot_table(index=['MovieID', 'movieCat'], columns='temp', values='rating', fill_value=0)
df.columns.name=None
df.reset_index(inplace=True)

#output
    MovieID movieCat                    Adventure   Comedy  Crime   Action
0   001     Action, Comedy, Adventure   4           4       0       4
1   002     Action, Crime               0           0       3       3
2   003     Crime                       0           0       2       0

After this you could export the table to Excel

df.to_excel('my_file.xlsx', index=False)
MiH
  • 354
  • 4
  • 11