Sort column in Pandas DataFrame by specific order

Question

Given this DataFrame:

df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])

I would like to sort column A in this order: July, August, Sept. Is there some way to use a sort function like "sort_values" but pre-define the sort order by values?

score 16 · Answer 1 · answered Oct 12 '18 at 17:44

16

Using Categorical

df.A=pd.Categorical(df.A,categories=['July', 'August', 'Sept'])
df=df.sort_values('A')
df
Out[310]: 
        A  B
1    July  3
0  August  2
2    Sept  6

answered Oct 12 '18 at 17:44

BENY

317,841
20
164
234

Thanks, can you define the column categorically after it exists? – sparrow Oct 12 '18 at 17:50
@sparrow sorry what you mean after it exists – BENY Oct 12 '18 at 17:52
I mean, can you create the dataframe first and then define the categories for a column in the df? – sparrow Oct 12 '18 at 17:53
@sparrow yes , you can – BENY Oct 12 '18 at 17:53
`df.A=df.A.astype('category')` @sparrow ` pd.DataFrame(data=df.A, dtype='category')` – BENY Oct 12 '18 at 17:55
Thanks, how do you define what the categories are after changing the dtype? – sparrow Oct 12 '18 at 17:57
@sparrow unfortunately, you need to define it one by one in a list and passed to `categories` – BENY Oct 12 '18 at 18:00
Though it's important to note that if you want to customize sort order, you will have to build up a custom iterable either way. This is probably the most convenient method. Not to mention the fastest. – PMende Oct 12 '18 at 18:14
1

It worked when I passed them in this way: df['A'] = df['A'].astype(pd.api.types.CategoricalDtype(categories=['July','August','Sept'])), https://stackoverflow.com/questions/47537823/futurewarning-specifying-categories-or-ordered-in-astype-is-deprecated – sparrow Oct 12 '18 at 18:51

score 8 · Answer 2 · answered Oct 12 '18 at 17:47

8

Define the order in a dictionary and sort according to it

sort_dict = {'July':0,'August':1,'Sept':2}
df.iloc[df['A'].map(sort_dict).sort_values().index]

Output

       A    B
1   July    3
0   August  2
2   Sept    6

answered Oct 12 '18 at 17:47

Yuca

6,010
3
22
42

score 4 · Accepted Answer · answered Oct 12 '18 at 18:39

Are you opposed to using either complete month names or consistent abbreviations?

df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])

df

import calendar

df = df.replace({'Sept':'September'})

calendar.month_name[1:]

Output:

['January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December']

df['A'] = pd.Categorical(df.A, categories=calendar.month_name[1:], ordered=True)

df.sort_values('A')

Output:

           A  B
1       July  3
0     August  2
2  September  6

Or use calendar.month_abbr

calendar.month_abbr[1:]

Output:

['Jan',
 'Feb',
 'Mar',
 'Apr',
 'May',
 'Jun',
 'Jul',
 'Aug',
 'Sep',
 'Oct',
 'Nov',
 'Dec']

score 3 · Answer 4 · answered Jul 20 '22 at 11:09

3

since pandas version 1.1.0, sort_values support sort by key.

df = df.sort_values('A', key=lambda s: s.apply(['July', 'August', 'Sept'].index), ignore_index=True)

answered Jul 20 '22 at 11:09

Charles Liu

31
2

This is the canonical and most generalizable way to sort in a specific order. The requirement that the key function be vectorized is a bit ridiculous, and forces you to use this non-trivial `s.apply` construction, and there's no example provided in the docs either! – Praveen Jul 25 '23 at 19:56

score 1 · Answer 5 · answered Oct 12 '18 at 17:44

You can assign your own values for sorting the column by, sort by those, then drop them:

df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])
value_map = {'August': 1, 'July': 0, 'Sept': 2}
def sort_by_key(df, col, value_map):
    df = df.assign(sort = lambda df: df[col].map(value_map))
    return df.sort_values('sort') \
             .drop('sort', axis='columns')

sort_by_key(df, 'A', value_map)

Results in:

        A  B
1    July  3
0  August  2
2    Sept  6

score 1 · Answer 6 · answered Oct 12 '18 at 17:46

Temporarily convert the str month to datetime and sort

df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])
df['tmp'] = pd.to_datetime(df['A'].str[:3], format='%b').dt.month
df.sort_values(by = ['tmp']).drop('tmp', 1)


    A       B
1   July    3
0   August  2
2   Sept    6

Declan · Answer 7 · 2018-10-12T22:02:44.160

I changed your 'Sept' to 'September' to keep it consistent with the other months' naming convention.

Then I made an ordered list of month names with pd.date_range.

Subdivided the list by the values you had (keeps the correct month-order).

Made a categorical using that sublist, and then sorted on those values

import pandas as pd


df = pd.DataFrame([['August', 2], ['July', 3], ['September', 6]], columns=['A', 'B'])

full_month_list = pd.date_range('2018-01-01','2019-01-01', freq='MS').strftime("%B").tolist()
partial_month_list = [x for x in month_list if x in df['A'].values]
df['A'] = pd.Categorical(df['A'], partial_month_list)

df.sort_values('A')

Results in:

    A           B
1   July        3
0   August      2
2   September   6

Sort column in Pandas DataFrame by specific order

7 Answers7

Linked