10

I have the following dataframe that I would like to sort first by Criticality and then by Name:

Name        Criticality
baz         High
foo         Critical
baz         Low
foo         Medium
bar         High
bar         Low
bar         Medium
...

I've been trying to do this using the answer provided in this post but I just can't get it to work.

The end result should be like this

Name        Criticality
bar         High
bar         Medium
bar         Low
baz         High
baz         Low
foo         Critical
foo         Medium
Community
  • 1
  • 1
Blark
  • 307
  • 1
  • 3
  • 15
  • 1
    Based on your posted end result, I think you actually want to sort by Name first, then Criticality. Right? – exp1orer Apr 24 '14 at 21:48
  • @exp1orer yes but without using an external ordering you end up with alpha sorting on the Criticality which is not the desired output – EdChum Apr 24 '14 at 21:56
  • EdChum, totally right. I just asked because his question says that he wants to "sort first by Criticality and then by Name". but the sample output suggests otherwise. – exp1orer Apr 24 '14 at 21:58
  • @exp1orer Yes the desired output contradicts the description so I went with the output – EdChum Apr 24 '14 at 22:08
  • Sorry, clearly I am confused :) EdChum nailed the answer below! – Blark Apr 24 '14 at 22:10
  • Time has passed and things have changed. As of writing, the [accepted answer](https://stackoverflow.com/a/13839029/14148248) in the question you have linked works, as shown in the [answer by user5843090](https://stackoverflow.com/a/60173458/14148248) below. – Patrick FitzGerald Feb 28 '21 at 16:10

4 Answers4

19

One approach would be to use a custom dict to create a 'rank' column, we then use to sort with and then drop the column after sorting:

In [17]:
custom_dict = {'Critical':0, 'High':1, 'Medium':2, 'Low':3}  
df['rank'] = df['Criticality'].map(custom_dict)
df

Out[17]:

  Name Criticality  rank
0  baz        High     1
1  foo    Critical     0
2  baz         Low     3
3  foo      Medium     2
4  bar        High     1
5  bar         Low     3
6  bar      Medium     2

[7 rows x 3 columns]

In [19]:
# now sort by 'Name' and 'rank', it will first sort by 'Name' column first and then 'rank'
df.sort(columns=['Name', 'rank'],inplace=True)
df

Out[19]:

  Name Criticality  rank
4  bar        High     1
6  bar      Medium     2
5  bar         Low     3
0  baz        High     1
2  baz         Low     3
1  foo    Critical     0
3  foo      Medium     2

[7 rows x 3 columns]

In [21]:
# now drop the 'rank' column
df.drop(labels=['rank'],axis=1)

Out[21]:

  Name Criticality
4  bar        High
6  bar      Medium
5  bar         Low
0  baz        High
2  baz         Low
1  foo    Critical
3  foo      Medium

[7 rows x 2 columns]
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • I see this answer is 7 years old. Does the pandas API have an easier solution yet, like ```df.sort(by="Criticality", order=['Critical','High','Medium','Low'])```? Or is the easiest way still to create a dictionary, use that to create a new column, sort on that new column, then drop it? – Joe Apr 09 '21 at 12:51
  • The solution by user5843090 using `pd.Categorical` (introduced in late 2014 after this question was answered) offers the same functionality in with IMHO more comprehensive code, see below. – euronion Mar 04 '22 at 10:45
7

I works for me using pd.Categorical

In [114]: df = pd.DataFrame({
          'Name' : ["baz","foo","baz","foo","bar","bar","bar"],
          'Criticality' : ["hi", "crt", "lo", "med", "hi", "lo", "med"]
          })

     ...: df['Criticality'] = pd.Categorical(df['Criticality'], ["crt","hi", "med", "lo"])

     ...: df.sort_values(['Name','Criticality'])
Out[114]: 
  Name Criticality
4  bar          hi
6  bar         med
5  bar          lo
0  baz          hi
2  baz          lo
1  foo         crt
3  foo         med
user5843090
  • 127
  • 1
  • 7
2

As of Pandas 1.1 there is another option with the key param of sort_values.

df = pd.DataFrame(
    {
        "Name": ["baz", "foo", "baz", "foo", "bar", "bar", "bar"],
        "Criticality": ["High", "Critical", "Low", "Medium", "High", "Low", "Medium"],
    }
)

SEVERITY = ["Critical", "High", "Medium", "Low"]

def sorter(column):
    if column.name != "Criticality":
        return column
    mapper = {name: order for order, name in enumerate(SEVERITY)}
    return column.map(mapper)

new_df = df.sort_values(by=["Name", "Criticality"], key=sorter)

#   Name Criticality
# 4  bar        High
# 6  bar      Medium
# 5  bar         Low
# 0  baz        High
# 2  baz         Low
# 1  foo    Critical
# 3  foo      Medium

In some cases you can also use pd.Categorical in the sorter to return a categorical Series but it seems like Pandas is choking when sorting on multiple values where one or more are Categorical.

[EDIT]: It looks like there was a bug related to Categorical sorters which should be fixed soon.

totalhack
  • 2,298
  • 17
  • 23
  • Does this solution have any advantage over the one shown in the [answer by user5843090](https://stackoverflow.com/a/60173458/14148248)? – Patrick FitzGerald Feb 28 '21 at 16:14
  • 1
    At the time of posting pandas was not always handling sorting on multiple categorical columns, so this would have worked around that. I have not compared the performance between the two, but notably this does not seem to require modifying or copying the original dataframe in preparation for the sort. The other answer is a bit less verbose though and may be appropriate/fine in most cases. – totalhack Feb 28 '21 at 17:32
1

Based on the answer of EdChum, this worked for me :

custom_dict = {'Critical':0, 'High':1, 'Medium':2, 'Low':3}  

df['rank'] = df['Criticality'].map(custom_dict)

# now sort by 'Name' and 'rank', it will first sort by 'Name' column first and then 'rank'
df.sort_values(by=['Name', 'rank'],inplace=True) 

# now drop the 'rank' column
df.drop(labels=['rank'],axis=1)

Basically, I used "sort_values" instead of "sort" and 'by" instead of "columns"

Julie
  • 11
  • 1