How to sort pandas DataFrame with a key?

Question

I'm looking for a way to sort pandas DataFrame. pd.DataFrame.sort_values doesn't accept a key function. I can convert it to list and apply a key to sorted function, but that will be slow. The other way seems something related to categorical index. I don't have a fixed number of rows so I don't know if categorical index will be applicable.

I have given an example case of what kind of data I want to sort:

Input DataFrame:

     clouds  fluff
0    {[}      1
1    >>>      2
2     {1      3
3    123      4
4  AAsda      5
5    aad      6

Output DataFrame:

     clouds  fluff
0    >>>      2
1    {[}      1
2     {1      3
3    123      4
4    aad      6
5  AAsda      5

The rule for sorting (priority):

First special characters (sort by ascii among themselves)

Next is by numbers

next is by lower case alphabets (lexicographical)

next is Capital case alphabets (lexicographical)

In plain python I'd do it like

from functools import cmp_to_key

def ks(a, b):
    # "Not exactly this but similar"
    if a.isupper():
        return -1
    else:
        return 1

Case

sorted(['aa', 'AA', 'dd', 'DD'], key=cmp_to_key(ks))

Answer:

['DD', 'AA', 'aa', 'dd']

How would you do it with Pandas?

https://stackoverflow.com/questions/38662826/sort-pandas-dataframe-with-function-over-column-values have you checked this? — hellpanderr, Sep 24 '18 at 08:55
@hellpanderr, Yes but I don't know how to translate this problem to that way. — Vasantha Ganesh, Sep 25 '18 at 07:36
There is an [issue](https://github.com/pandas-dev/pandas/issues/3942) filed on GitHub related to this. — Vasantha Ganesh, Sep 25 '18 at 07:51

score 6 · Accepted Answer · answered Aug 06 '20 at 09:03

6

As of pandas 1.1.0, pandas.DataFrame.sort_values accepts an argument key with type callable.

So in this case we would use:

df.sort_values(by='clouds', key=kf)

where kf is the key function that operates on type Series. Accepts and returns Series.

answered Aug 06 '20 at 09:03

Vasantha Ganesh

4,570
3
25
33

score 4 · Answer 2 · answered Sep 08 '21 at 06:51

As of pandas 1.2.0, I did this

import numpy as np
import pandas as pd

df = pd.DataFrame(['aa', 'dd', 'DD', 'AA'], columns=["data"])

# This is the sorting rule
rule = {
    "DD": 1,
    "AA": 10,
    "aa": 20,
    "dd": 30,
    }


def particular_sort(series):
    """
    Must return one Series
    """
    return series.apply(lambda x: rule.get(x, 1000))


new_df = df.sort_values(by=["data"], key=particular_sort)
print(new_df)  # DD, AA, aa, dd

Of course, you can do this too, but it may be difficult to understand,smile

new_df = df.sort_values(by=["data"], key=lambda x: x.apply(lambda y: rule.get(y, 1000)))
print(new_df)  # DD, AA, aa, dd

Khalil Al Hooti · Answer 3 · 2018-09-24T11:40:33.620

This might be useful, yet still not sure about special characters! can they actally be sorted!!

import pandas as pd

a = [2, 'B', 'c', 1, 'a', 'b',3, 'C', 'A']

df = pd.DataFrame({"a": a})
df['upper'] = df['a'].str.isupper()
df['lower'] = df['a'].str.islower()
df['int'] = df['a'].apply(isinstance,args = [int])

df2 = pd.concat([df[df['int'] == True].sort_values(by=['a']), 
           df[df['lower'] == True].sort_values(by=['a']),
           df[df['upper'] == True].sort_values(by=['a'])])

print(df2)

   a    upper   lower   int
3   1   NaN     NaN     True
0   2   NaN     NaN     True
6   3   NaN     NaN     True
4   a   False   True    False
5   b   False   True    False
2   c   False   True    False
8   A   True    False   False
1   B   True    False   False
7   C   True    False   False

you can also do it in one step with creating new True False columns!

a = [2, 'B', 'c', 1, 'a', 'b',3, 'C', 'A']
df = pd.DataFrame({"a": a})
df2 = pd.concat([df[df['a'].apply(isinstance,args = [int])].sort_values(by=['a']), 
           df[df['a'].str.islower() == True].sort_values(by=['a']),
           df[df['a'].str.isupper() == True].sort_values(by=['a'])])

    a
3   1
0   2
6   3
4   a
5   b
2   c
8   A
1   B
7   C

score 2 · Answer 4 · answered Aug 04 '20 at 16:53

This seems to work:

def sort_dataframe_by_key(dataframe: DataFrame, column: str, key: Callable) -> DataFrame:
    """ Sort a dataframe from a column using the key """
    sort_ixs = sorted(np.arange(len(dataframe)), key=lambda i: key(dataframe.iloc[i][column]))
    return DataFrame(columns=list(dataframe), data=dataframe.iloc[sort_ixs].values)

It passes tests:

def test_sort_dataframe_by_key():
    dataframe = DataFrame([{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 1, 'c': 1}, {'a': 3, 'b': 4, 'c': 0}])
    assert sort_dataframe_by_key(dataframe, column='a', key=lambda x: x).equals(
           DataFrame([{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 1, 'c': 1}, {'a': 3, 'b': 4, 'c': 0}]))
    assert sort_dataframe_by_key(dataframe, column='a', key=lambda x: -x).equals(
           DataFrame([{'a': 3, 'b': 4, 'c': 0}, {'a': 2, 'b': 1, 'c': 1}, {'a': 1, 'b': 2, 'c': 3}]))
    assert sort_dataframe_by_key(dataframe, column='b', key=lambda x: -x).equals(
           DataFrame([{'a': 3, 'b': 4, 'c': 0}, {'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 1, 'c': 1}]))
    assert sort_dataframe_by_key(dataframe, column='c', key=lambda x: x).equals(
           DataFrame([{'a': 3, 'b': 4, 'c': 0}, {'a': 2, 'b': 1, 'c': 1}, {'a': 1, 'b': 2, 'c': 3}]))

How to sort pandas DataFrame with a key?

4 Answers4

Linked