Pandas - sort_values by combining multiple columns

Question

Assume file sizes.txt with the following content:

Sig Sta FP Method Size
10 10 100 array 108
10 10 100 csr-heur 130
10 10 100 list 220
10 10 15 array 108
10 10 15 csr-heur 45
10 10 15 list 50
10 10 25 array 108
10 10 25 csr-heur 62
10 10 25 list 70
10 10 50 array 108
10 10 50 csr-heur 95
8 4 100 array 40
8 4 100 csr-heur 50
8 4 100 list 78
8 4 25 array 40
8 4 25 csr-heur 26
8 4 25 list 30
8 4 50 array 40
8 4 50 csr-heur 36
8 4 50 list 46
8 4 75 array 40
8 4 75 csr-heur 43
8 4 75 list 62

And the following code:

def m4():
    df=pandas.read_csv('sizes.txt', sep=' ')
    df=df.pivot(index=['Sig','Sta','FP'],columns='Method',values='Size')
    vals = [v[0] * v[1] * v[2] / 100.0 for v in df.index]
    df['vals'] = vals
    df = df.sort_values(by='vals', kind='mergesort', axis='index')
    print(df)
    # df.drop('vals') # <- causes error if executed

m4()

Output is:

Method       array  csr-heur   list   vals
Sig Sta FP
8   4   25    40.0      26.0   30.0    8.0
10  10  15   108.0      45.0   50.0   15.0
8   4   50    40.0      36.0   46.0   16.0
        75    40.0      43.0   62.0   24.0
10  10  25   108.0      62.0   70.0   25.0
8   4   100   40.0      50.0   78.0   32.0
10  10  50   108.0      95.0    NaN   50.0
        100  108.0     130.0  220.0  100.0

And this is exactly what I expect. But I also want to drop column vals after sorting. Unfortunately uncommenting this causes exception:

Traceback (most recent call last):
  File "/home/jd/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3803, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 146, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index_class_helper.pxi", line 49, in pandas._libs.index.Int64Engine._check_type
KeyError: 'vals'

It is most probably caused by the fact that sorting isn't performed immediately but is delayed, so when data is actually sorted, the column is gone and

So there are two questions, how to either:

Drop this vals column without causing error above, or
Sort DataFrame using index that is a tuple.

I tried sort_index with custom key, but this causes my sort function to be called exactly three times, each time with next level of the key - this is clearly stated in documentation and perfectly acceptable, but at the same time absolutely useless in my case. Is there anything that can be done about it?

@jezrael Thanks, adding axis=1 helped. Care to explain why it works? Also I clearly wrote that I tried using sort_index, but I see no way to do it given how values are passed to custom sorting function. — Jędrzej Dudkiewicz, Jan 18 '23 at 08:39
`axis=1` means remove columns names, if omit it get default `axis=0` for remove row by index (there is no index `vals`, so raised error) — jezrael, Jan 18 '23 at 08:40
`but I see no way to do it given how values are passed to custom sorting function.` Unfortuantely not understand what need. — jezrael, Jan 18 '23 at 08:41
@jezrael Ok, I understand why dropping does not work the way I did it. Regarding sorting index: I want to combine all three values from index to create single float that will be used for sorting, but function passed to `sort_index` as `key` passes values from each level separately, documentation says "For MultiIndex inputs, the key is applied per level.". I can't figure out how to do what I want (I want to sort exactly as `vals` column sorts it). — Jędrzej Dudkiewicz, Jan 18 '23 at 08:55
What means `I want to combine all three values from index to create single float that will be used for sorting` ? Can you add your solution what need for sorting and also how looks final sorted DataFrame? Because seems not undertand each other. — jezrael, Jan 18 '23 at 08:58

Pandas - sort_values by combining multiple columns

0 Answers0