0

I want to reorder the rows of my dataframe based on a custom order over multiple columns.

Say, I have the following df:

import pandas as pd
df = pd.DataFrame.from_dict({'Name': {0: 'Tim', 1: 'Tim', 2: 'Tim', 3: 'Ari', 4: 'Ari', 5: 'Ari', 6: 'Dan', 7: 'Dan', 8: 'Dan'}, 'Subject': {0: 'Math', 1: 'Science', 2: 'History', 3: 'Math', 4: 'Science', 5: 'History', 6: 'Math', 7: 'Science', 8: 'History'}, 'Test1': {0: 10, 1: 46, 2: 54, 3: 10, 4: 83, 5: 39, 6: 10, 7: 58, 8: 10}, 'Test2': {0: 5, 1: 78, 2: 61, 3: 7, 4: 32, 5: 43, 6: 1, 7: 28, 8: 50}})

which looks like this

Name  Subject  Test1  Test2
Tim     Math     10      5
Tim  Science     46     78
Tim  History     54     61
Ari     Math     10      7
Ari  Science     83     32
Ari  History     39     43
Dan     Math     10      1
Dan  Science     58     28
Dan  History     10     50

I want to sort it by Name first according to custom order ['Dan','Tim','Ari'] and then sort it by Subject according to custom order ['Science','History','Math'].

So my final df should look like

Name  Subject  Test1  Test2
Dan  Science     58     28
Dan  History     10     50
Dan     Math     10      1
Tim  Science     46     78
Tim  History     54     61
Tim     Math     10      5
Ari  Science     83     32
Ari  History     39     43
Ari     Math     10      7

It seems like a simple thing, but I can't quite figure it out how to do it. The closest solution I could find was how to custom reorder rows according to a single column here. I want to be able to do this for multiple columns simultaneously.

2 Answers2

1

You can create 2 temporary columns for sorting and then drop them after you've sorted your df.

(
    df.assign(key1=df.Name.map({'Dan':0, 'Tim':1, 'Ari':2}), 
              key2=df.Subject.map({'Science':0, 'History':1, 'Math':2}))
    .sort_values(['key1', 'key2'])
    .drop(['key1', 'key2'], axis=1)
)
Allen Qin
  • 19,507
  • 8
  • 51
  • 67
1

You can represent Name and Subject as categorical variables:

names = ['Dan','Tim','Ari']
subjects = ['Science','History','Math']

df = df.astype({'Name': pd.CategoricalDtype(names, ordered=True),
                'Subject': pd.CategoricalDtype(subjects, ordered=True)})
>>> df.sort_values(['Name', 'Subject'])
  Name  Subject  Test1  Test2
7  Dan  Science     58     28
8  Dan  History     10     50
6  Dan     Math     10      1
1  Tim  Science     46     78
2  Tim  History     54     61
0  Tim     Math     10      5
4  Ari  Science     83     32
5  Ari  History     39     43
3  Ari     Math     10      7

>>> df.sort_values(['Subject', 'Name'])
  Name  Subject  Test1  Test2
7  Dan  Science     58     28
1  Tim  Science     46     78
4  Ari  Science     83     32
8  Dan  History     10     50
2  Tim  History     54     61
5  Ari  History     39     43
6  Dan     Math     10      1
0  Tim     Math     10      5
3  Ari     Math     10      7
Corralien
  • 109,409
  • 8
  • 28
  • 52