0

I have a pandas table such as:

Entries Col1    Col2    Col3    Col4
Entry1  -1.46   93.93   3.33    92.51   
Entry2  -48.59  31.49   -22.97  80.25
Entry3  8.24    95.85   -5.05   90.29

I want to sort the Entries based on all 4 columns. Column 1 and 3 should be ranked by which is closer to 0 and Column 2 and 4 should be ranked by highest number.

At the moment I have something like this:

data.sort_values(cols, ascending=[False,True,False,True],inplace=True)

But all this does is sort by the first column and the other columns are insignificant to the sorting. I need the Entries to be sorted by all the columns. If Entry 1 is only the best in Column 1 and Entry 2 is best in the other 3 then Entry 3 should be sorted to the top.

Expected output:

Entries Col1    Col2    Col3    Col4
Entry1  -1.46   93.93   3.33    92.51   
Entry3  8.24    95.85   -5.05   90.29
Entry2  -48.59  31.49   -22.97  80.25

Entry 1 is best in Col:1,3 and 4 Entry 2 is worse in all Cols Entry 3 is best in Col2 and second best in others.

Thanks.

NJD
  • 197
  • 1
  • 4
  • 17
  • Can you add your expected output. – Erfan Jan 05 '20 at 12:25
  • Can you show the output for your example table? Do you want to sort columns individually without taking the whole row? Or sort by equally weighting all the columns? – Raghava Dhanya Jan 05 '20 at 12:25
  • I believe your sort objective is not very clear in the question. You say "closer to 0" yet you sort that column without taking absolute values. Maybe what you want to do is getting the percentile ranks of individual columns according to your objective, then average those ranks in a new column and sort that new column? – Kocas Jan 05 '20 at 12:25
  • I have added expected output. The problem is that I have a large list of entries and values for them. The values all have values between -100 and +100 and I need the Rows to be ordered according to their overall sorting/ranking - Creating an addition column 'Rank' which is based on all other columns would be a good idea but I can only sort independently at the moment – NJD Jan 05 '20 at 12:30
  • So Duplicate of https://stackoverflow.com/questions/41974374/pandas-rank-by-multiple-columns – Raghava Dhanya Jan 05 '20 at 12:40
  • I am not sure. I looked there but it looks like the answers are still only ranking by one column. Is it actually doing an equal ranking/sorting on all columns and not just 'TotalRevenue' – NJD Jan 05 '20 at 12:46

1 Answers1

1

Start from creating 2 auxiliary columns, Col1a and Col3a holding absolute values of respective source columns:

data['Col1a'] = data.Col1.abs()
data['Col3a'] = data.Col3.abs()

The sort your DataFrame:

data.sort_values(['Col1a', 'Col2', 'Col3a',  'Col4'],
    ascending=[True, False, True, False], inplace=True)

Note that the ascending parameter is different than in your code.

And finally, drop the auxiliary columns:

data.drop(columns=['Col1a', 'Col3a'], inplace=True)
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • I see the need for getting the absolute value, but if i 21 Columns in my full dataset, will your answer not have the same problem where the sorting of Column 1 impacts the sorting of the other columns? – NJD Jan 05 '20 at 13:53
  • 1
    *sort_values* sorts on all passed columns but their order is significant. First the sort is on the first passed column. But if values of these columns in both rows in question are equal than additional sort is performed on the second column (and so on). – Valdi_Bo Jan 05 '20 at 14:56
  • I see, the problem is not the values being the same but where one Entry may be 'best' in Column 1 but lowly ranked in the other columns. This would place it at the top even though overall it is not the best but in fact one of the worse 'Entries' – NJD Jan 05 '20 at 15:08
  • 1
    Yes, ordinary *sort_values* works just this way. Consider writing a "ranking function", computing **one** ranking, based on all columns. Then, add a column - the result of this function and sort on just this (single) column. – Valdi_Bo Jan 05 '20 at 16:20
  • Yes, I have been thinking about that today. just lacking the knowledge of how to create a rank from more than one column. – NJD Jan 06 '20 at 09:47