0

I have this df code

df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})

yielding

    A       qty
0   0-5     7
1   18-23   15
2   12-17   8
3   6-11    34

I would like to order the df by col 'A' without having to number the A column, so that later when I do graphs I don't have the numbers.

This is the desired output after sorting the df by column A:

     A        qty
0   0-5        7
3   6-11       34
2   12-17      8
1   18-23      15

To achieve a similar result I would:

# add a category code
df['A'] = df['A'].astype('category').cat.codes + 1
# convert format
df['A'] = df['A'].astype('string')
# use a dictionary to rename (based on former output)
dic = {
'1':'1_0-5',
'3':'3_18-23',        
'2':'2_12-17', 
'4':'4_6-11',    
}
df['A'] = df['A'].replace(dic, regex=True)
## use a dictionary to rename again
dic = {
'1_0-5':'1_0-5',
'3_18-23':'4_18-23',        
'2_12-17':'3_12-17', 
'4_6-11':'2_6-11',    
}
df['A'] = df['A'].replace(dic, regex=True)

by doing this, I can achieve this:

           A         qty
    0   1_0-5        7
    1   2_6-11       15
    2   3_12-17      8
    3   4_18-23      34

Groupby does not work for me, while it would order column A as desired, when I would do graphs, order would not be kept.

josepmaria
  • 373
  • 1
  • 11

3 Answers3

3

Don't reinvent the wheel, use natsort_key for natural sorting:

# pip install natsort
from natsort import natsort_key

out = df.sort_values(by='A', key=natsort_key)

Output:

       A  qty
0    0-5    7
1   6-11   15
2  12-17    8
3  18-23   34

Or for fun, using numpy.lexsort:

out = df.iloc[np.lexsort(df['A'].str.split('-', expand=True)
                         .astype(int).to_numpy()[:, ::-1].T)]
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Why not `df.sort_values('A', key=lambda x: (x.str.split('-', expand=True)[0]).astype(int))`? – shaik moeed Jul 10 '23 at 09:32
  • 1
    @shaikmoeed that would only sort by the first number, imagine you have `1-2` and `1-3` (not in the example) that would also handle this case. Also `lexsort` is quite efficient ;) – mozway Jul 10 '23 at 09:33
  • But I would use `natsort` that is the most flexible, it would also handle letters etc. – mozway Jul 10 '23 at 09:34
1

If possible sort by first integer value is possible use key parameter in DataFrame.sort_values:

out = df.sort_values('A', 
                    key=lambda x: x.str.extract('(\d+)', expand=False).astype(int),
                    ignore_index=True)
print (out)
       A  qty
0    0-5    7
1   6-11   15
2  12-17    8
3  18-23   34

Or use natural sorting:

from natsort import natsorted

out = df.sort_values("A",key=natsorted, ignore_index=True)
print (out)
       A  qty
0    0-5    7
1   6-11    8
2  12-17   34
3  18-23   15

EDIT: If need sorting by strings use ordered Categoricals:

df = pd.DataFrame({'A': ['mike','alice', 'john','brian'], 'qty':[7,15,8,34]})



df['A'] = (pd.Categorical(df['A'], 
                         categories=['john','alice', 'mike','brian'], 
                         ordered=True))

out = df.sort_values('A', ignore_index=True)
print (out)
       A  qty
0   john    8
1  alice   15
2   mike    7
3  brian   34
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • would this work if 'A' column values were strings like 'mike', 'alice', 'john', 'brian' ? – josepmaria Jul 10 '23 at 09:35
  • 1
    @josepmaria - Second solution yes, if need sort by natural sorting. – jezrael Jul 10 '23 at 09:36
  • 1
    @josepmaria - Or need sorting by order of list? Added solution to answer. – jezrael Jul 10 '23 at 09:40
  • Then that would be a completely different question, [already covered here](https://stackoverflow.com/questions/23482668/sorting-by-a-custom-list-in-pandas) (among others) – mozway Jul 10 '23 at 09:42
-1

To sort the column A in the dataframe df in the desired order, you can extract the numerical values from the A column, sort them, and then use the sorted values to sort the entire dataframe. Here's the code to accomplish this:

import pandas as pd

df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})

# Extract numerical values from 'A' column
df['sort_key'] = df['A'].str.split('-', expand=True)[0].astype(int)

# Sort the dataframe based on the 'sort_key'
df = df.sort_values('sort_key')

# Drop the 'sort_key' column
df = df.drop('sort_key', axis=1)

print(df)

Output:

       A  qty
0    0-5    7
3   6-11   34
2  12-17    8
1  18-23   15