52

I want to sort by name length. There doesn't appear to be a key parameter for sort_values so I'm not sure how to accomplish this. Here is a test df:

import pandas as pd
df = pd.DataFrame({'name': ['Steve', 'Al', 'Markus', 'Greg'], 'score': [2, 4, 2, 3]})
cs95
  • 379,657
  • 97
  • 704
  • 746
Alex
  • 12,078
  • 6
  • 64
  • 74
  • Possible duplicate of [sort dataframe by length of string in a column](https://stackoverflow.com/questions/46177362/sort-dataframe-by-length-of-string-in-a-column) – cs95 Sep 12 '17 at 13:34
  • @jezrael Please read my reason. I mentioned it explicitly: https://stackoverflow.com/questions/46177362/sort-dataframe-by-length-of-string-in-a-column#comment79318016_46177362 – cs95 Sep 12 '17 at 14:01
  • There are more options there. If not, you can edit this answer and include all those other solutions. – cs95 Sep 12 '17 at 14:02

5 Answers5

51

You can use reindex of index of Series created by len with sort_values:

print (df.name.str.len())
0    5
1    2
2    6
3    4
Name: name, dtype: int64

print (df.name.str.len().sort_values())
1    2
3    4
0    5
2    6
Name: name, dtype: int64

s = df.name.str.len().sort_values().index
print (s)
Int64Index([1, 3, 0, 2], dtype='int64')

print (df.reindex(s))
     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

df1 = df.reindex(s)
df1 = df1.reset_index(drop=True)
print (df1)
     name  score
0      Al      4
1    Greg      3
2   Steve      2
3  Markus      2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Great answer, I tried this approach with lists too (Sorting a DataFrame by list length), since `.str.len()` works with lists as mentioned in the question **Pythonic way for calculating length of lists in pandas dataframe column** in this [link](https://stackoverflow.com/questions/41340341/pythonic-way-for-calculating-length-of-lists-in-pandas-dataframe-column) – otayeby Jul 09 '17 at 23:22
44

Using DataFrame.sort_values we can pass an anonymous (lambda) function computing string length (using .str.len() Series method) to the key argument:

df = pd.DataFrame({
    'name': ['Steve', 'Al', 'Markus', 'Greg'], 
    'score': [2, 4, 2, 3]
})
print(df)

     name  score
0   Steve      2
1      Al      4
2  Markus      2
3    Greg      3
df.sort_values(by="name", key=lambda x: x.str.len())

     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2
mirekphd
  • 4,799
  • 3
  • 38
  • 59
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • Thanks. Just in case someone needs to lower case and sort `df.sort_index(key=lambda x: x.str.lower().str.len())` – Shovra Jan 07 '23 at 16:48
16

I found this solution more intuitive, specially if you want to do something depending on the column length later on.

df['length'] = df['name'].str.len()
df.sort_values('length', ascending=False, inplace=True)

Now your dataframe will have a column with name length with the value of string length from column name in it and the whole dataframe will be sorted in descending order.

moshfiqur
  • 2,065
  • 3
  • 24
  • 27
3

The answer of @jezrael is great and explains well. Here is the final result :

index_sorted = df.name.str.len().sort_values(ascending=True).index
df_sorted = df.reindex(index_sorted)
df_sorted = df_sorted.reset_index(drop=True)
Thierry G.
  • 295
  • 4
  • 13
3

A fancy and minimal solution:

df.iloc[df.agg({"name":len}).sort_values('name').index]



     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2
Billy Bonaros
  • 1,671
  • 11
  • 18