I am on anaconda pandas 0.18.1. I used to do this on pandas 12 to lag some values in a multi-index dataframe, where index level 0 = date, and index level 1 = security ID (3 years of daily data):
In [1]: testDB.head(2)
Out[1]:
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 2 entries, (2013-01-01, 000312) to (2013-01-01, 00036020)
Columns: 140 entries, in_universe to alpha_Tile
In[2]:
# lag a certain field:
#lag alphas
A_LAGS=[0,1,2,3,5,10,30,60,90]
grouped=testDB.groupby(level=1)['alpha']
for lag in A_LAGS:
lagName='lagA_'+str(int(lag))
testDB[lagName]=grouped.shift(periods=lag) # move old ones forward
Now on pandas 18 this just never finishes. I mean, never - 4 hours and going. I understand there were changes in how pandas handles dates, so I tried to use tshift - does not work and complains about 'freq not set'. It indeed is not (the frame is constructed from the h5 file that contains dates,identifiers and other data):
In [3]: testDB.index.levels[0]
Out[3]: DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
'2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12',
'2016-01-13', '2016-01-14',
...
'2016-11-17', '2016-11-18', '2016-11-21', '2016-11-22',
'2016-11-23', '2016-11-24', '2016-11-25', '2016-11-28',
'2016-11-29', '2016-11-30'],
dtype='datetime64[ns]', name=u'date', length=239, freq=None)
so I try to reset index to DateTimeIndex with proper frequency set up.
In [4]: i = testDB.index.set_levels(
pd.DatetimeIndex(fullFrame.index.levels[0],freq='B'),
level=0)
i.levels[0]
Out[4]:
DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
'2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12',
'2016-01-13', '2016-01-14',
...
'2016-11-17', '2016-11-18', '2016-11-21', '2016-11-22',
'2016-11-23', '2016-11-24', '2016-11-25', '2016-11-28',
'2016-11-29', '2016-11-30'],
dtype='datetime64[ns]', name=u'date', length=239, freq='B')
Now frequency is set, but I cannot replace the index:
In [5]:y = x.reindex(index=i,level=0)
y.index.levels[0]
Out[5]:
DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
'2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12',
'2016-01-13', '2016-01-14',
...
'2016-11-17', '2016-11-18', '2016-11-21', '2016-11-22',
'2016-11-23', '2016-11-24', '2016-11-25', '2016-11-28',
'2016-11-29', '2016-11-30'],
dtype='datetime64[ns]', name=u'date', length=239, freq=None)
So finally (apologies for the long intro, I wanted to provide full data) two questions:
- How does one do shift on a massive multi-indexed frame in Pandas 18?
- Why am I not able to reset an index to the newly created one?