Suppose I have a multi-indexed (on columns) dataframe like this:
value
serial 1 2 3 4 5
name
Tom 0.657175 -0.999668 0.750363 1.113235 -1.199095
I'd like to access each columns using a for-loop
. I can do it under the multi-index structure:
#with multi-index
for i in range(1,6):
x = df['value'][i]
This, however, is much slower than if I collpase the columns into one-level:
#collapse multi-index
df.columns = [x[0] + str(x[1]) for x in df.columns]
for i in range(1,6):
x = df['value'+str(i)]
I don't understand why this is the case. Since I'd like to keep the multi-index structure for the dataframe, is there a faster way of accessing the content? Or is there a way to transform the index in code 2 back to multi-index easily?
Comment: I realized there are two ways to access multi-indexed columns, as pointed out by @joris. Though both ways are listed on pandas document, df[('value', i)] is much faster than df['value'][i]; and both are slower than df['value'+str(i)]. Below is a speed comparison of the three ways:
%timeit -n 1000 x = df['value'][2]
1000 loops, best of 3: 350 µs per loop
%timeit -n 1000 x = df[('value', 2)]
1000 loops, best of 3: 18.6 µs per loop
%timeit -n 1000 x = df['value' + str(2)]
1000 loops, best of 3: 4.1 µs per loop
Any help is appreciated.