Sort DataFrame index that has a string and number

Question

My df DataFrame index looks like this:

Com_Lag_01
Com_Lag_02
Com_Lag_03
Com_Lag_04
Com_Lag_05
Com_Lag_06
Com_Lag_07
Com_Lag_08
Com_Lag_09
Com_Lag_10
Com_Lag_101
Com_Lag_102
Com_Lag_103
...
Com_Lag_11
Com_Lag_111
Com_Lag_112
Com_Lag_113
Com_Lag_114
...
Com_Lag_12
Com_Lag_120
...
Com_Lag_13
Com_Lag_14
Com_Lag_15

I want to sort this index so that the numbers go from Com_Lag_1 to Com_Lag_120. If I use df.sort_index() I will get the same thing as above. Any suggestion on how to sort this index properly?

You'd have to do a reverse find of the last '_', then cast to an int and order by this number — EdChum, May 06 '14 at 11:52

Guillaume Jacquenot · Accepted Answer · 2017-11-27T16:09:57.010

One could try something like this, by performing a sort on a numbered version of the index

import pandas as pd
# Create a DataFrame example
df = pd.DataFrame(\
    {'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
    'Age': [27 ,25 ,22 ,34 ,31],\
    },\
    index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_5'])

# Add of a column containing a numbered version of the index
df['indexNumber'] = [int(i.split('_')[-1]) for i in df.index]
# Perform sort of the rows
df.sort(['indexNumber'], ascending = [True], inplace = True)
# Deletion of the added column
df.drop('indexNumber', 1, inplace = True)

Edit 2017 - V1:

To avoid SettingWithCopyWarning:

df = df.assign(indexNumber=[int(i.split('_')[-1]) for i in df.index])

Edit 2017 - V2 for Pandas Version 0.21.0

import pandas as pd
print(pd.__version__)
# Create a DataFrame example
df = pd.DataFrame(\
    {'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
    'Age': [27 ,25 ,22 ,34 ,31],\
    },\
    index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_5'])

df.reindex(index=df.index.to_series().str.rsplit('_').str[-1].astype(int).sort_values().index)

This no loger works as .sort has been depreciated https://stackoverflow.com/questions/44123874/dataframe-object-has-no-attribute-sort . Use the answer with .sort_index instead. Also is only one line! https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html — ic_fl2, Oct 15 '22 at 08:55

jezrael · Answer 2 · 2017-07-27T08:40:22.047

Solution without new column with DataFrame.reindex by index of sorted Series :

a = df.index.to_series().str.rsplit('_').str[-1].astype(int).sort_values()
print (a)
Com_Lag_1      1
Com_Lag_3      3
Com_Lag_5      5
Com_Lag_12    12
Com_Lag_24    24
dtype: int32

df = df.reindex(index=a.index)
print (df)
            Age  Year
Com_Lag_1    27  1991
Com_Lag_3    22  2001
Com_Lag_5    31  1997
Com_Lag_12   25  2004
Com_Lag_24   34  2009

But if duplicated values is necessary add new column:

df = pd.DataFrame(\
    {'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
    'Age': [27 ,25 ,22 ,34 ,31],\
    },\
    index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_12'])

print (df)
            Age  Year
Com_Lag_1    27  1991
Com_Lag_12   25  2004
Com_Lag_3    22  2001
Com_Lag_24   34  2009
Com_Lag_12   31  1997

df['indexNumber'] = df.index.str.rsplit('_').str[-1].astype(int)
df = df.sort_values(['indexNumber']).drop('indexNumber', axis=1)
print (df)
            Age  Year
Com_Lag_1    27  1991
Com_Lag_3    22  2001
Com_Lag_12   25  2004
Com_Lag_12   31  1997
Com_Lag_24   34  2009

score 3 · Answer 3 · answered Apr 23 '21 at 12:52

3

Another solution is

    df.sort_index(key=lambda x: (x.to_series().str[8:].astype(int)), inplace=True)

The 8 comes from the position where the numeric values start

answered Apr 23 '21 at 12:52

KarenJG

31
2

1

This is the correct approach and should be accepted, as .sort is depreciated in pandas! https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html – ic_fl2 Oct 15 '22 at 08:54
This correct answer should be accepted – Farid Alijani Sep 01 '23 at 09:25

Sort DataFrame index that has a string and number

3 Answers3

Linked