0

I have a dataframe with columns ['name', 'sex', 'births', 'year']. I then group the dataframe on the basis of name to create 2 new columns "max" and "total".

trendy_names['max'] = trendy_names.groupby(['name'], as_index = False)['births'].transform('max')
trendy_names['total'] = trendy_names.groupby(['name'], as_index = False)['births'].transform('sum')

Using these 2 columns, I create a calculated column "trendiness".

trendy_names['trendiness'] = trendy_names['max']/trendy_names['total']

Then, I segregate those that have a total number of births greater than 1000.

trendy_names = trendy_names[trendy_names.total >= 1000]

Now, I want to sort the dataframe on the basis of "trendiness" column. Any thoughts?

Tanmoy
  • 789
  • 7
  • 14
  • name sex births year max total trendiness 0 Mary F 7065 1880 73983 4135851 0.017888 1 Anna F 2604 1880 15666 886614 0.017669 2 Emma F 2003 1880 22702 635686 0.035713 3 Elizabeth F 1939 1880 20742 1625783 0.012758 4 Minnie F 1746 1880 3274 159494 0.020527 – Tanmoy Mar 12 '18 at 09:55
  • That's *too much* information (yes, that's a thing). This whole question really boils down to "how to sort a (grouped) dataframe based on values in a specific column" – DeepSpace Mar 12 '18 at 09:55
  • @jezrael That's exactly what I want. But I cant do it as the dataframe is already grouped by. – Tanmoy Mar 12 '18 at 09:56
  • @Tanmoy - Then need `trendy_names = rendy_names.sort_values(['name','trendiness'])` – jezrael Mar 12 '18 at 09:57
  • Also check - [How to sort a dataFrame in python pandas by two or more columns?](https://stackoverflow.com/q/17141558) – jezrael Mar 12 '18 at 10:27
  • @jezrael - This doesn't work. The code snippet would sort on the basis of name and trendiness. What I want, instead, is to group on the basis of name and sort on the basis of trendiness. Thoughts? – Tanmoy Mar 12 '18 at 11:07
  • @Tanmoy - It is not same? I think yes. Can you add sme snippet of data? – jezrael Mar 12 '18 at 11:12
  • @jezrael [link]https://drive.google.com/open?id=1RYjjBJ33w3FvYb-LWNrRIieAtl4O8Bcq Here's the snippet. Till I put in the condition, the df is fine. But as soon as I do, it gets messy. Thoughts? – Tanmoy Mar 13 '18 at 04:18
  • @Tanmoy - I check it and I am confused. What is expected output? I think the best is working here with small data sample - 10-20 rows, test solution and if working (in small data sample easy verifyable) apply solution to all dataset.Thanks. – jezrael Mar 13 '18 at 06:50
  • @jezrael As you could see in the dataframe.head(), it shows grouped "name". But the moment I use the filter condition df[df.column >=1000].sort_values(by=[col2'], ascending=False).... this ungroups the data. – Tanmoy Mar 13 '18 at 08:14
  • Do you try `df[df.column >=1000].sort_values(by=['name', 'col2'], ascending=[True, False])` ? – jezrael Mar 13 '18 at 08:45

1 Answers1

0

To sort the dataframe on the basis of "trendiness" which is type: DataFrameGroupBy

 1. trendy_names.reset_index()

reset_index() - converting back to a regular index i.e converting pandas.core.groupby.DataFrameGroupBy to pandas.core.frame.DataFrame

 2. trendy_names.sort_values(by = 'trendiness')
Jaimil Patel
  • 1,301
  • 6
  • 13