1

I got a list of results, which I'd like to sort by two criteria. I performed first sorting on the sys column:

systems = {'BzBz_S':0,'BzBz_PD34':1,'MeMe':2}
sorted_results = sorted(results, key = lambda e: (systems[e[0]]))

and passed to DataFrame

df = pd.DataFrame(sorted_results,columns = ['sys','dis','basis','Energy'])

This gives me the following output:

,sys,dis,system,basis,Energy
0,BzBz_S,10.0,BzBz_S_10.0,S,0.02562465
1,BzBz_S,3.2,BzBz_S_3.2,S,1.48510297
2,BzBz_S,3.3,BzBz_S_3.3,S,-0.25086498
3,BzBz_S,6.0,BzBz_S_6.0,S,-0.11827975
4,BzBz_S,3.9,BzBz_S_3.9,S,-2.44705244
5,BzBz_PD34,0.4,BzBz_PD34_0.4,PD34,-1.88172312
6,BzBz_PD34,0.2,BzBz_PD34_0.2,PD34,-1.50519034
7,MeMe,5.0,MeMe_5.0,5,-0.12194283
8,MeMe,5.4,MeMe_5.4,5,-0.07556324

How can I create a second sort criterion to sort the dis column, in addition to sys, to get such a final result:

,sys,dis,system,basis,Energy
0,BzBz_S,3.2,BzBz_S_3.2,S,1.48510297
1,BzBz_S,3.3,BzBz_S_3.3,S,-0.25086498
2,BzBz_S,3.9,BzBz_S_3.9,S,-2.44705244 
3,BzBz_S,6.0,BzBz_S_6.0,S,-0.11827975
4,BzBz_S,10.0,BzBz_S_10.0,S,0.02562465
5,BzBz_PD34,0.2,BzBz_PD34_0.2,PD34,-1.50519034
6,BzBz_PD34,0.4,BzBz_PD34_0.4,PD34,-1.88172312
7,MeMe,5.0,MeMe_5.0,5,-0.12194283
8,MeMe,5.4,MeMe_5.4,5,-0.07556324
Monica
  • 1,030
  • 3
  • 17
  • 37
  • 1
    Does this answer your question? [How to sort a dataFrame in python pandas by two or more columns?](https://stackoverflow.com/questions/17141558/how-to-sort-a-dataframe-in-python-pandas-by-two-or-more-columns) – some_programmer Jun 06 '20 at 12:09
  • I have tried it before. If I sort data by two columns: df.sort_values(['sys', 'dis'], ascending=[True, False], inplace=True), I get the sys column messed up, since it is sorted alphabetically. – Monica Jun 06 '20 at 12:14
  • I think you want df.sort_values(['sys', 'dis'], ascending=[True, True], inplace=True) -- alphabetical ordering shouldn't be a problem – optimus_prime Jun 06 '20 at 12:26
  • Unfortunately, it does not work as I want to. While the dis column is sorted correctly, the ordering of the sys columns is: BzBZ_PD34, BzBz_S, MeMe. – Monica Jun 06 '20 at 12:37

1 Answers1

1

After you get the first output , you can do the following to get the final output, hope this helps!

df['sys_cat']=df['sys'].astype('category')  #creating a categorical column in the dataframe

d = dict(zip(df.sys_cat,df.sys_cat.cat.codes)) # converting categorical column into codes

# reassigning categories
count=0
for i in d:
    d[i]=count
    count+=1

df['sys_cat']=df['sys_cat'].map(d).astype(int)
df.sort_values(by=['sys_cat', 'dis'],ascending=[True, True], inplace=True)
df.drop(['sys_cat'], inplace=True, axis=1)
df.reset_index(inplace=True, drop=True)
df

gives:

          sys   dis            system   basis      Energy
0      BzBz_S   3.2        BzBz_S_3.2   S        1.485103
1      BzBz_S   3.3        BzBz_S_3.3   S       -0.250865
2      BzBz_S   3.9        BzBz_S_3.9   S       -2.447052
3      BzBz_S   6.0        BzBz_S_6.0   S       -0.118280
4      BzBz_S   10.0      BzBz_S_10.0   S        0.025625
5   BzBz_PD34   0.2     BzBz_PD34_0.2   PD34    -1.505190
6   BzBz_PD34   0.4     BzBz_PD34_0.4   PD34    -1.881723
7        MeMe   5.0          MeMe_5.0   5       -0.121943
8        MeMe   5.4          MeMe_5.4   5       -0.075563
snehil
  • 586
  • 6
  • 12