Assume file sizes.txt
with the following content:
Sig Sta FP Method Size
10 10 100 array 108
10 10 100 csr-heur 130
10 10 100 list 220
10 10 15 array 108
10 10 15 csr-heur 45
10 10 15 list 50
10 10 25 array 108
10 10 25 csr-heur 62
10 10 25 list 70
10 10 50 array 108
10 10 50 csr-heur 95
8 4 100 array 40
8 4 100 csr-heur 50
8 4 100 list 78
8 4 25 array 40
8 4 25 csr-heur 26
8 4 25 list 30
8 4 50 array 40
8 4 50 csr-heur 36
8 4 50 list 46
8 4 75 array 40
8 4 75 csr-heur 43
8 4 75 list 62
And the following code:
def m4():
df=pandas.read_csv('sizes.txt', sep=' ')
df=df.pivot(index=['Sig','Sta','FP'],columns='Method',values='Size')
vals = [v[0] * v[1] * v[2] / 100.0 for v in df.index]
df['vals'] = vals
df = df.sort_values(by='vals', kind='mergesort', axis='index')
print(df)
# df.drop('vals') # <- causes error if executed
m4()
Output is:
Method array csr-heur list vals
Sig Sta FP
8 4 25 40.0 26.0 30.0 8.0
10 10 15 108.0 45.0 50.0 15.0
8 4 50 40.0 36.0 46.0 16.0
75 40.0 43.0 62.0 24.0
10 10 25 108.0 62.0 70.0 25.0
8 4 100 40.0 50.0 78.0 32.0
10 10 50 108.0 95.0 NaN 50.0
100 108.0 130.0 220.0 100.0
And this is exactly what I expect. But I also want to drop column vals
after sorting. Unfortunately uncommenting this causes exception:
Traceback (most recent call last):
File "/home/jd/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3803, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 146, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 49, in pandas._libs.index.Int64Engine._check_type
KeyError: 'vals'
It is most probably caused by the fact that sorting isn't performed immediately but is delayed, so when data is actually sorted, the column is gone and
So there are two questions, how to either:
- Drop this
vals
column without causing error above, or - Sort DataFrame using index that is a tuple.
I tried sort_index
with custom key
, but this causes my sort function to be called exactly three times, each time with next level of the key - this is clearly stated in documentation and perfectly acceptable, but at the same time absolutely useless in my case. Is there anything that can be done about it?