Select and minimum value of a data frame column, by category

Question

I have a data frame representing IMDb ratings of a selection of tv shows with the following columns:

date, ep_no, episode, show_title, season, rating

I need to select the lowest rated episode of each show, but I am having trouble displaying all of the columns I want.

I can successfully select the correct data using:

df.groupby('show_title')['rating'].min()

But this only displays the show title and the rating of the lowest rated episode for that show.

I need it to display: show_title, ep_no, episode, rating

I have tried various tweaks to the code, from the simple to the complex, but I guess I'm just not experienced enough to crack this particular puzzle right now.

Any ideas?

score 1 · Accepted Answer · answered Dec 17 '17 at 04:45

1

If I understand what you want, this question is similar to this question; And the following code should do the trick.

df[df.groupby('show_title')['rating'].transform(min) == df['rating']]

answered Dec 17 '17 at 04:45

Ryan

2,073
1
19
33

Thanks, all of the solutions worked, but this one seems to provide the most accurate result. – Brian McNamara Dec 17 '17 at 15:09

score 0 · Answer 2 · answered Dec 17 '17 at 04:49

0

One approach is to sort the DataFrame by rating, then dropping duplicates of show while keeping the first occurrence of each show:

df.sort_values(by='rating').drop_duplicates(['show_title'], keep='first')

answered Dec 17 '17 at 04:49

Peter Leimbigler

10,775
1
23
37

score 0 · Answer 3 · answered Dec 17 '17 at 05:30

# It's easy just do a sort by show_title , rating before using groupby

df.sort_values(by=['show_title','rating'],inplace=True)

# Now use groupby and return the first instance of every group by object
# first row will automatically contain the minimum rating
df1 = df.groupby('show_title').first()

Select and minimum value of a data frame column, by category

3 Answers3