0

I need to sort a dataframe by one column, which includes a combination of numbers and letters.

df = [{"user": "seth",
       "name": "1"},
     {"user" : "chris",
       "name": "10A"},
     {"user" : "aaron",
       "name": "4B"},
     {"user" : "dan",
       "name": "10B"}]

My code:

df1 = df.sort_values(by=['name'])

This gets me:

df1 = [{"user": "seth",
       "name": "1"},
     {"user" : "chris",
       "name": "10A"},
     {"user" : "dan",
       "name": "10B"},
     {"user" : "aaron",
       "name": "4B"}]

I want:

df1 =    [{"user": "seth",
           "name": "1"},
         {"user" : "aaron",
           "name": "4B"},
         {"user" : "chris",
           "name": "10A"},
         {"user" : "dan",
           "name": "10B"}]

Edit:

It was flagged as a similar question, and their code:

   DPRexitPoints.reindex(index=natsorted(DPRexitPoints.PageName))

It returns a sorted dataframe, but all values have been replaced by NaNs.

FallingInForward
  • 285
  • 2
  • 4
  • 12

1 Answers1

1

You can do an np.argsort and iloc:

df.iloc[np.argsort(df['name'].str
                      .extract('^(\d*)')[0]
                      .astype(int))
       ]

Output:

    user name
0   seth    1
2  aaron   4B
1  chris  10A
3    dan  10B
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • hmmm, it sorting by numbers only, not by combinations of numbers and letters. Possible solution should be extract both and the sorting or use `natural sort` – jezrael Jun 17 '20 at 13:35
  • It gives me this error: ValueError: invalid literal for int() with base 10: '' – FallingInForward Jun 17 '20 at 13:49