I have a pandas dataframe. The final column in the dataframe is the max value of the RelAb
column for each unique group (in this case, a species assignment) in the dataframe as obtained by:
df_melted['Max'] = df_melted.groupby('Species')['RelAb'].transform('max')
As you can see, the max value is represented in all rows of the group. Each group contains a large number of rows. I have the df
sorted by max values, for which there are about 100 rows per max value. My goal is to obtain the top 20 groups based on the max value (i.e. a df
with 100 X 20 rows - 2000 rows). I do not want to drop individual rows from groups in the dataframe, rather entire groups.
I am pasting a subset of the dataframe where the max for a group changes from one "Max" value to the next:
My feeling is that I need to convert the max so that the one value represents the entire group and then sort based on that column, perhaps as such?
For context, the reason I am doing this is because I am planning to make a stacked barchart with the most abundant species in the table for each sample. Right now, there are just way too many species, so it makes the stacked bar chart uninformative.