1

I'm trying to slice a pandas dataframe indexed by a period index with a list of strings with unexpected results.

import pandas as pd
import numpy as np
idx = pd.period_range(1991,1993,freq='A')    
df = pd.DataFrame(np.arange(9).reshape(3,3),index=idx)
print df.loc[['1991','1993'],:]

results in:

KeyError: "None of [['1991', '1993']] are in the [index]"

If the the last line is switched to:

print df.ix[['1991','1993'],:]

The output is

Out[128]:
        0   1   2
1991    NaN NaN NaN
1993    NaN NaN NaN

If instead of a period index I have

idx = [str(year) for year in range(1991,1994)]
print df.loc[['1991','1993'],:]

Then the output is as expected:

Out[127]:
        0   1   2
1991    0   1   2
1993    6   7   8

So my question is: how to slice a pandas dataframe with a period index?

Artturi Björk
  • 3,643
  • 6
  • 27
  • 35

1 Answers1

3

Pandas doesn't convert the strings into Periods for you, so you have to be more explicit. You could use:

In [38]: df.loc[[pd.Period('1991'), pd.Period('1993')], :]
Out[38]: 
      0  1  2
1991  0  1  2
1993  6  7  8

or

In [39]: df.loc[map(pd.Period, ['1991', '1993']), :]
Out[39]: 
      0  1  2
1991  0  1  2
1993  6  7  8

or

In [40]: df.loc[[idx[0],idx[-1]], :]
Out[40]: 
      0  1  2
1991  0  1  2
1993  6  7  8

By the way, when you pass an arbitrary list of items to df.loc Pandas returns a new sub-DataFrame with a copy of values from df. This is not a slice. To slice you would need to use the slicing notation: a:b. For example,

In [64]: df.loc[pd.Period('1991'): pd.Period('1993'): 2, :]
Out[64]: 
        0  1  2
1991    0  1  2
1993    6  7  8

The distinction is important because in NumPy and Pandas slices return views while non-slice indexing return copies.

Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Do you have an explanation as to why Pandas converts a string to periods when it is not in a list? `df.loc['1991',:]` works even with a period index. – Artturi Björk Dec 01 '15 at 14:54
  • Fixing this is [an open issue](https://github.com/pydata/pandas/issues/11278) in the current version of Pandas. – unutbu Dec 01 '15 at 15:06