0

I have a movies dataset dataframe with the following columns:-

movieid

title

audience_score

rating

runtime

genre

language

Problem is that there are multiple entries of the same movieid and for the same movie_id there may be different rating, runtime, genre, language etc.

Now what I want to do is to group together these entries with the same movieid, and then replace this 'group' by a single entry corresponding to one movieid. And I want that the feature values for this single entry should be derived from the feature values of the group it is going to replace.

Like for example, I want that the single entry's rating should be the median of all the ratings from the group to which it belonged, its genre should be the most frequent entry in the genre column of the group to which it belonged.

Is there a way to do this? I explored the group_by function but could not find something which helped me achieve what I want.

Abhijit Singh
  • 57
  • 2
  • 10
  • 1
    Check this thread [link](https://stackoverflow.com/questions/14529838/apply-multiple-functions-to-multiple-groupby-columns). You can use agg method that can be applied to many columns, for each using a different function. – PawelWL Jul 28 '23 at 06:31

0 Answers0