Sorting a grouped dataframe

Question

I have a dataframe with columns ['name', 'sex', 'births', 'year']. I then group the dataframe on the basis of name to create 2 new columns "max" and "total".

trendy_names['max'] = trendy_names.groupby(['name'], as_index = False)['births'].transform('max')
trendy_names['total'] = trendy_names.groupby(['name'], as_index = False)['births'].transform('sum')

Using these 2 columns, I create a calculated column "trendiness".

trendy_names['trendiness'] = trendy_names['max']/trendy_names['total']

Then, I segregate those that have a total number of births greater than 1000.

trendy_names = trendy_names[trendy_names.total >= 1000]

Now, I want to sort the dataframe on the basis of "trendiness" column. Any thoughts?

name sex births year max total trendiness 0 Mary F 7065 1880 73983 4135851 0.017888 1 Anna F 2604 1880 15666 886614 0.017669 2 Emma F 2003 1880 22702 635686 0.035713 3 Elizabeth F 1939 1880 20742 1625783 0.012758 4 Minnie F 1746 1880 3274 159494 0.020527 — Tanmoy, Mar 12 '18 at 09:55
That's *too much* information (yes, that's a thing). This whole question really boils down to "how to sort a (grouped) dataframe based on values in a specific column" — DeepSpace, Mar 12 '18 at 09:55
@jezrael That's exactly what I want. But I cant do it as the dataframe is already grouped by. — Tanmoy, Mar 12 '18 at 09:56
@Tanmoy - Then need `trendy_names = rendy_names.sort_values(['name','trendiness'])` — jezrael, Mar 12 '18 at 09:57
Also check - [How to sort a dataFrame in python pandas by two or more columns?](https://stackoverflow.com/q/17141558) — jezrael, Mar 12 '18 at 10:27
@jezrael - This doesn't work. The code snippet would sort on the basis of name and trendiness. What I want, instead, is to group on the basis of name and sort on the basis of trendiness. Thoughts? — Tanmoy, Mar 12 '18 at 11:07
@Tanmoy - It is not same? I think yes. Can you add sme snippet of data? — jezrael, Mar 12 '18 at 11:12
@jezrael [link]https://drive.google.com/open?id=1RYjjBJ33w3FvYb-LWNrRIieAtl4O8Bcq Here's the snippet. Till I put in the condition, the df is fine. But as soon as I do, it gets messy. Thoughts? — Tanmoy, Mar 13 '18 at 04:18
@Tanmoy - I check it and I am confused. What is expected output? I think the best is working here with small data sample - 10-20 rows, test solution and if working (in small data sample easy verifyable) apply solution to all dataset.Thanks. — jezrael, Mar 13 '18 at 06:50
@jezrael As you could see in the dataframe.head(), it shows grouped "name". But the moment I use the filter condition df[df.column >=1000].sort_values(by=[col2'], ascending=False).... this ungroups the data. — Tanmoy, Mar 13 '18 at 08:14
Do you try `df[df.column >=1000].sort_values(by=['name', 'col2'], ascending=[True, False])` ? — jezrael, Mar 13 '18 at 08:45

score 0 · Answer 1 · edited May 25 '20 at 23:09

0

To sort the dataframe on the basis of "trendiness" which is type: DataFrameGroupBy

 1. trendy_names.reset_index()

reset_index() - converting back to a regular index i.e converting pandas.core.groupby.DataFrameGroupBy to pandas.core.frame.DataFrame

 2. trendy_names.sort_values(by = 'trendiness')

edited May 25 '20 at 23:09

Jaimil Patel

1,301
6
13

answered May 25 '20 at 16:05

Neville Abraham

1

Sorting a grouped dataframe

1 Answers1