1

Suppose I have a multi-indexed (on columns) dataframe like this:

           value                                        
serial         1         2         3         4         5
name                                                    
Tom     0.657175 -0.999668  0.750363  1.113235 -1.199095

I'd like to access each columns using a for-loop. I can do it under the multi-index structure:

#with multi-index

for i in range(1,6):
    x = df['value'][i]

This, however, is much slower than if I collpase the columns into one-level:

#collapse multi-index

df.columns = [x[0] + str(x[1]) for x in df.columns]
for i in range(1,6):
    x = df['value'+str(i)]

I don't understand why this is the case. Since I'd like to keep the multi-index structure for the dataframe, is there a faster way of accessing the content? Or is there a way to transform the index in code 2 back to multi-index easily?

Comment: I realized there are two ways to access multi-indexed columns, as pointed out by @joris. Though both ways are listed on pandas document, df[('value', i)] is much faster than df['value'][i]; and both are slower than df['value'+str(i)]. Below is a speed comparison of the three ways:

%timeit -n 1000 x = df['value'][2]
1000 loops, best of 3: 350 µs per loop
%timeit -n 1000 x = df[('value', 2)]
1000 loops, best of 3: 18.6 µs per loop
%timeit -n 1000 x = df['value' + str(2)]
1000 loops, best of 3: 4.1 µs per loop

Any help is appreciated.

Zhen Sun
  • 817
  • 3
  • 13
  • 20
  • 'Normal' multi-index column access is like ``df[('value', i)]``. Is this faster than `df['value'][i]`? – joris Mar 27 '14 at 15:22
  • @joris, thanks! It's indeed much faster, though still 50% slower than the collapsed columns on my test code. However, when I run it on my data, it's about as slow as `df['value'][i]` (more than 10 times slower than collapsed columns). I got the warning message `SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame` (also with `df['value'][i]`). Do you think that might be the problem? – Zhen Sun Mar 28 '14 at 00:11
  • That's part of the trade-off for using MultiIndex, that it's slightly slower. But the `SettingWithCopyWarning` is because, as the warnings says, you are trying to set new values in, so change, a copy of part of the DataFrame. I suppose in your example code you are trying to modify `x`. See also http://stackoverflow.com/questions/21463589/pandas-chained-assignments/21463854#21463854. Wat are you trying to do in the for loop? – joris Mar 28 '14 at 13:44
  • @joris, thanks! I think in my case the warning is false positive. It only pops up when I use multiindex. I use the loop for this situation: I have roughly 20 variables (columns), called `'event1', ... 'event20'`, and I am looking for a keyword `'keywd'`, which can occur in any one of them, and it occurs at different `event` variables for different observations (rows). I need to record the `keywd event` and its corresponding dates `date_i` for each observation (i.e., I have a date column corresponding to each event column). Please advise me if you think of better ways than a `for loop`. – Zhen Sun Mar 28 '14 at 17:15

0 Answers0