I am trying to understand multi-indexing. I have found some very good links (here by Jake VanderPlas and here by Nelson Minar) but I am not able to grasp the concept.
I do have some specific questions. Specifically, if we talk about this data -
import pandas as pd
# hierarchical indices and columns
index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],
names=['year', 'visit'])
columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],
names=['subject', 'type'])
# mock some data
data = np.round(np.random.randn(4, 6), 1)
data[:, ::2] *= 10
data += 37
# create the DataFrame
health_data = pd.DataFrame(data, index=index, columns=columns)
Then:
- Why/how does
health_data.loc[:,'Guido']
removes the top column index whereashealth_data.loc[:,['Guido']]
preserves it. - Why do
health_data.loc[:, [('Bob', 'HR')]]
andhealth_data.loc[:, ('Bob', 'HR')]
work as intended (assuming answer to question 1 is clear) buthealth_data.loc[:, ['Bob', 'HR']]
gives an extra column. - If I define
idx = pd.IndexSlice
then whyhealth_data.loc[:,[idx['Bob','HR']]]
returns the intended output buthealth_data.loc[:,list(idx['Bob','HR'])]
returns the extra column
As my questions arose from my lack of understanding of multi-index, any links which explains them in detail will be help as well. I have seen some of the SO questions and answers (this one helps a bit) but most of them are very specific and I could not find one that talks about in general about double bracket concept.