0

I have a data frame representing IMDb ratings of a selection of tv shows with the following columns:

date, ep_no, episode, show_title, season, rating

I need to select the lowest rated episode of each show, but I am having trouble displaying all of the columns I want.

I can successfully select the correct data using:

df.groupby('show_title')['rating'].min()

But this only displays the show title and the rating of the lowest rated episode for that show.

I need it to display: show_title, ep_no, episode, rating

I have tried various tweaks to the code, from the simple to the complex, but I guess I'm just not experienced enough to crack this particular puzzle right now.

Any ideas?

3 Answers3

1

If I understand what you want, this question is similar to this question; And the following code should do the trick.

df[df.groupby('show_title')['rating'].transform(min) == df['rating']]
Ryan
  • 2,073
  • 1
  • 19
  • 33
0

One approach is to sort the DataFrame by rating, then dropping duplicates of show while keeping the first occurrence of each show:

df.sort_values(by='rating').drop_duplicates(['show_title'], keep='first')
Peter Leimbigler
  • 10,775
  • 1
  • 23
  • 37
0
# It's easy just do a sort by show_title , rating before using groupby

df.sort_values(by=['show_title','rating'],inplace=True)

# Now use groupby and return the first instance of every group by object
# first row will automatically contain the minimum rating
df1 = df.groupby('show_title').first()
Abhishek Sharma
  • 1,909
  • 2
  • 15
  • 24