I have a large pandas dataframe, with time series data and a rather large multiindex. Said index contains various information about the time series, such as for example location, datatype and so on.
Now I want to add a new row to the index, with an integer (or float, doesnt really matter), containing a distance to a certain point. Following that, I want to sort the dataframe by this distance.
I am not sure how to add a new index level, and how to assign new values by hand. Also, can pandas even sort columns after an random number in one of its index levels?
Example
(code from here )
header=pd.MultiIndex.from_product([['location1','location2'],['S1','S2','S3']],names=['loc','S'])
df = pd.DataFrame(np.random.randn(5, 6), index=['a','b','c','d','e'], columns = header)
Looks like this:
loc location1 location2
S S1 S2 S3 S1 S2 S3
a 1.530590 0.536364 1.295848 0.422256 -1.853786 1.334981
b 0.275857 -0.848685 -1.212584 -0.464235 -0.855600 0.680985
c -1.209607 0.265359 -0.695233 0.643896 1.315216 -0.751027
d -1.591613 -0.178605 0.878567 0.647389 -0.454313 -1.972509
e 1.098193 -0.766810 0.087173 0.714301 -0.886545 -0.826163
What I want to to, is in a first step, add some distances to each column, like location1 S1 add dist 200
, location1 S2 add dist 760
and so on, resulting in this:
loc location1 location2
S S1 S2 S3 S1 S2 S3
dist 200 760 10 1000 340 70
a 1.530590 0.536364 1.295848 0.422256 -1.853786 1.334981
b 0.275857 -0.848685 -1.212584 -0.464235 -0.855600 0.680985
c -1.209607 0.265359 -0.695233 0.643896 1.315216 -0.751027
d -1.591613 -0.178605 0.878567 0.647389 -0.454313 -1.972509
e 1.098193 -0.766810 0.087173 0.714301 -0.886545 -0.826163
And then do something like df.sortlevel('dist')
, resulting in
loc location1 location2 location1 location2 location1 location2
S S3 S3 S1 S2 S2 S1
dist 10 70 200 340 760 1000
a 1.295848 1.334981 1.530590 -1.853786 0.536364 0.422256
b -1.212584 0.680985 0.275857 -0.855600 -0.848685 -0.464235
…
So that have the whole thing sorted by distance, for things like plt.matshow(df.corr())
.
Can pandas even sort a df after a random index with an integer? Because I have another dataframe, that already has an integer in its multindex, and here some_otherdf.sortlevel('HZB')
results in TypeError: can only sort by level with a hierarchical index
Edit:
As of now, there is two answers, both of which work perfectly fine for my test case. I think @Pedro M Duarte's answer might be the more correct one, given that it uses the multiindex as intended. However, for my real data, it would require either a lot of reworking, or a lot of typing, for a 7 levels deep multiindex and 50 data series, which is very error prone. @Nader Hisham ignored my request to stay in my multiindex, but it requires just the quick, easy and easy to check writing of a simple row of numbers (saving me a lot of time), that I can then remove after sorting. for other folks with a similar question, it might be different.