0

I have the following dataframe df:

    topic   num
0   a01     1
1   a01     1
2   a01     2
3   a02     1
4   a02     3
5   a02     2
6   a02     3
7   a03     2
8   a03     1

And I need to create a new dataframe newdf, where each row corresponds to the topic and the maximum number for each topic, like the following:

    topic   num
0   a01     2
1   a02     3
2   a03     2

I've tried to use the max() function from pandas, but to no avail. What I don't seem to get is how I'm gonna iterate through each row and find the highest value correspondent to the topic. How do I separate a01 from a02, so that I can get the maximum value for each? I've also tried transposing, but the same doubt keeps appearing.

Dahlia
  • 7
  • 2
  • Does this answer your question? [Get the row(s) which have the max value in groups using groupby](https://stackoverflow.com/questions/15705630/get-the-rows-which-have-the-max-value-in-groups-using-groupby) – Charles Yang Nov 29 '22 at 22:19

2 Answers2

0

See Get the row(s) which have the max value in groups using groupby

Example:

new_df = df.groupby(['topic'], sort=False)['num'].max()
Charles Yang
  • 330
  • 2
  • 10
0

You can use GroupBy.max with numeric_only=True:

newdf= df.groupby("topic", as_index=False).max(numeric_only=True)

Output:

print(newdf)

  topic  num
0   a01    2
1   a02    3
2   a03    2
scotscotmcc
  • 2,719
  • 1
  • 6
  • 29
Timeless
  • 22,580
  • 4
  • 12
  • 30