21

Based on python, sort descending dataframe with pandas:

Given:

from pandas import DataFrame
import pandas as pd

d = {'x':[2,3,1,4,5],
     'y':[5,4,3,2,1],
     'letter':['a','a','b','b','c']}

df = DataFrame(d)

df then looks like this:

df:
      letter    x    y
    0      a    2    5
    1      a    3    4
    2      b    1    3
    3      b    4    2
    4      c    5    1

I would like to have something like:

f = lambda x,y: x**2 + y**2
test = df.sort(f('x', 'y'))

This should order the complete dataframe with respect to the sum of the squared values of column 'x' and 'y' and give me:

test:
      letter    x    y
    2      b    1    3
    3      b    4    2
    1      a    3    4
    4      c    5    1
    0      a    2    5

Ascending or descending order does not matter. Is there a nice and simple way to do that? I could not yet find a solution.

cglacet
  • 8,873
  • 4
  • 45
  • 60
Ohumeronen
  • 1,769
  • 2
  • 14
  • 28

5 Answers5

33

You can create a temporary column to use in sort and then drop it:

df.assign(f = df['one']**2 + df['two']**2).sort_values('f').drop('f', axis=1)
Out: 
  letter  one  two
2      b    1    3
3      b    4    2
1      a    3    4
4      c    5    1
0      a    2    5
ayhan
  • 70,170
  • 20
  • 182
  • 203
  • 11
    this seems to be the best way to go, but it sorta sucks... it would be way more elegant to pass a lambda function into `sort_values`, the same way you'd do that for python's native `sorted()` call – Alex Spangher Jun 29 '18 at 16:42
  • 2
    @AlexSpangher, looks like we still don't have this feature supported yet for now, 2020 Feb :-( – avocado Feb 07 '20 at 18:58
  • The advantage of python is that when it doesn't exist you can just [add the method](https://stackoverflow.com/a/62624996/1720199). – cglacet Jun 28 '20 at 16:18
15
df.loc[(df.x ** 2 + df.y ** 2).sort_values().index]

after How to sort pandas dataframe by custom order on string index

andrewkittredge
  • 742
  • 5
  • 12
  • 1
    Thank you this is a realy nice solution! The index of the sorted data is used in combination with iloc. This is neat. No further column is needed. – Ohumeronen Apr 20 '20 at 13:48
  • 3
    That indeed look like the correct approach, on the other hand you should use `.loc` instead of `.iloc` because this wouldn't work with most indexes (it will only work with indexes like `list(range(n))`. I'll add an alternative this just in case. – cglacet Jun 28 '20 at 15:50
  • [There](https://stackoverflow.com/a/62624996/1720199) using `iloc` with `argsort` which is very similar to this strategy. – cglacet Jun 28 '20 at 16:04
3

Have you tried to create a new column and then sorting on that. I cannot comment on the original post, so i am just posting my solution.

df['c'] = df.a**2 + df.b**2
df = df.sort_values('c')
Sandeep
  • 141
  • 6
  • 1
    The "problem" with this solution is that it actually creates another column which is not the exact goal here (input and output column should be the same). – cglacet Jun 28 '20 at 16:05
1
from pandas import DataFrame
import pandas as pd

d = {'one':[2,3,1,4,5],
     'two':[5,4,3,2,1],
     'letter':['a','a','b','b','c']}

df = pd.DataFrame(d)

#f = lambda x,y: x**2 + y**2
array = []
for i in range(5):
    array.append(df.ix[i,1]**2 + df.ix[i,2]**2)
array = pd.DataFrame(array, columns = ['Sum of Squares'])
test = pd.concat([df,array],axis = 1, join = 'inner')
test = test.sort_index(by = "Sum of Squares", ascending = True).drop('Sum of Squares',axis =1)

Just realized that you wanted this:

    letter  one  two
2      b    1    3
3      b    4    2
1      a    3    4
4      c    5    1
0      a    2    5
Adam Warner
  • 1,334
  • 2
  • 14
  • 30
0

Another approach, similar to this one is to use argsort which returns the indexes permutation directly:

f = lambda r: r.x**2 + r.y**2
df.iloc[df.apply(f, axis=1).argsort()]

I think using argsort better translates the idea than a regular sort (we don't care about the value of this computation, only about the resulting indexes).

It could also be interesting to patch the DataFrame to add this functionality:

def apply_sort(self, *, key):
    return self.iloc[self.apply(key, axis=1).argsort()]

pd.DataFrame.apply_sort = apply_sort

We can then simply write:

>>> df.apply_sort(key=f)

   x  y letter
2  1  3      b
3  4  2      b
1  3  4      a
4  5  1      c
0  2  5      a
cglacet
  • 8,873
  • 4
  • 45
  • 60
  • since you do a row-wise apply here wouldnt this be trading a fair bit of performance on any vectorized operation compared to andrewkittredge's method? Does the sort vs argsort offset these concerns? – Skyler Oct 08 '20 at 15:53