Sorting across multiple columns pandas

Question

df = pd.DataFrame([["Alpha", 3, 2, 4], ["Bravo", 2, 3, 1], ["Charlie", 4, 1, 3], ["Delta", 1, 4, 2]], 
              columns = ["Company", "Running", "Combat", "Range"])
print(df)
  Company Running   Combat  Range
0   Alpha      3      2       4
1   Bravo      2      3       1
2   Charlie    4      1       3
3   Delta      1      4       2

Hi, I am trying to sort the the following dataframe so the rows would be arranged such that the best performing across the three columns would be at the top. In this case would be Bravo company as it is 2 in running, 3 in drills and 1 in range.

Would this approach work if the list have a lot more companies and it is hard to know the exact "best performing company"?

I have tried:

df_sort = df.sort_values(['Running', 'Combat', 'Range'], ascending=[True, True, True])

current output:

    Company Running Combat  Range
1   Delta      1      4     2
0   Bravo      2      3     1
3   Alpha      3      2     4
2   Charlie    4      1     3

but it doesn't turn out how I wanted it to be. Can this be done through pandas? I was expecting the output to be:

Company Running Combat  Range
0   Bravo   2     3     1
1   Delta   1     4     2
2   Charlie 4     1     3
3   Alpha   3     2     4

is row 0 the list of companies? if so, how is charlie 2 in running? which column is drills? — sammywemmy, Jun 02 '20 at 16:04
your dataframe structure looks weird to me. is that what is actually is? Can you correct the dataframe structure if not? — frank, Jun 02 '20 at 16:05
Does this answer your question? [How to sort a dataFrame in python pandas by two or more columns?](https://stackoverflow.com/questions/17141558/how-to-sort-a-dataframe-in-python-pandas-by-two-or-more-columns) — Roy2012, Jun 02 '20 at 16:11
Yes I would inspect your `df`with `df.head()`, I don't think the data are being represented how you think — Tom, Jun 02 '20 at 16:14
@MayankPorwal Hi, so sorry for the wrong dataframe example. I was trying to sort this through excel but I couldn't copy paste from the excel table. I have updated the question and current & expected output to make the question clearer. — MooseCakeRunner, Jun 03 '20 at 08:02
@jezrael Hi, thanks for the suggestion but I am trying to sort according to the "best average performance" across the three columns. It can be manually sorted according to "range" column now as there is only 4 rows, but there are thousands of rows more and would not be feasible to manually observe and sort the rows. Not sure if my question is clear? — MooseCakeRunner, Jun 03 '20 at 08:06

jezrael · Accepted Answer · 2020-06-03T12:51:32.833

1

If want sorting by means per rows first create mean, then add Series.argsort for positions of sorted values and last change order of values by DataFrame.iloc:

df1 = df.iloc[df.mean(axis=1).argsort()]
print (df1)
   Company  Running  Combat  Range
1    Bravo        2       3      1
3    Delta        1       4      2
2  Charlie        4       1      3
0    Alpha        3       2      4

EDIT: If need remove some columns before by DataFrame.drop:

cols = ['Overall','Subordination']
df2 = text_df.iloc[text_df.drop(cols, axis=1).mean(axis=1).argsort()]
print (df2)
   Company  Running  Combat  Overall Subordination  Range
1    Bravo        2       3     0.70          Poor      1
3    Delta        1       4     0.83          Good      2
2  Charlie        4       1     0.81          Good      3
0    Alpha        3       2     0.91     Excellent      4

edited Jun 03 '20 at 12:51

answered Jun 03 '20 at 10:14

jezrael

822,522
95
1,334
1,252

Hi, I am trying to understand the line of code. Does it mean df.mean(axis=1) will calculate the mean of all the columns before using .argsort() to move the rows according to the mean values? – MooseCakeRunner Jun 03 '20 at 12:10
@MooseCakeRunner - exactly you are right. I understand question need sorting by mean, or I am wrong? – jezrael Jun 03 '20 at 12:10
Yes that is correct, think my initial wasn't clear. This method of sorting works if there are more columns of ranking and more rows of companies (i.e. hundreds/thousands more)? I do not need to manually name the columns and rows to be sorted? – MooseCakeRunner Jun 03 '20 at 12:27
@MooseCakeRunner - Sure, if need mean per rows number of columns is not important. – jezrael Jun 03 '20 at 12:28
how about if there are other columns in between the columns which are like text strings, how would I go about that situation then? – MooseCakeRunner Jun 03 '20 at 12:31
@MooseCakeRunner - For me `mean` omit this columns, if not working use `df.select_dtypes(np.number).mean(axis=1).argsort()` for only numeric columns – jezrael Jun 03 '20 at 12:32
Certainly did helped, but is it possible to "drop" the columns, sort it by mean then place the columns dropped back after sorting in the same manner? – MooseCakeRunner Jun 03 '20 at 13:02
@MooseCakeRunner - It worked like `text_df.drop(cols, axis=1).mean(axis=1).argsort()` returned positions of rows, so if seelct by `iloc` get original data with changed order. – jezrael Jun 03 '20 at 13:03
Hi, just to clear things up. So text_df.drop(cols, axis=1) will drop the columns not needed for sorting; mean(axis=1).argsort() will calculate the mean before sorting the rows; iloc will move the original data according to the sorted rows by mean without affecting the columns arrangement? – MooseCakeRunner Jun 03 '20 at 13:10

Sorting across multiple columns pandas

1 Answers1