1

I am on anaconda pandas 0.18.1. I used to do this on pandas 12 to lag some values in a multi-index dataframe, where index level 0 = date, and index level 1 = security ID (3 years of daily data):

In [1]: testDB.head(2)
Out[1]:
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 2 entries, (2013-01-01, 000312) to (2013-01-01, 00036020)
Columns: 140 entries, in_universe to alpha_Tile

In[2]:
# lag a certain field:
#lag alphas
A_LAGS=[0,1,2,3,5,10,30,60,90]
grouped=testDB.groupby(level=1)['alpha']
for lag in A_LAGS:
    lagName='lagA_'+str(int(lag))
    testDB[lagName]=grouped.shift(periods=lag) # move old ones forward

Now on pandas 18 this just never finishes. I mean, never - 4 hours and going. I understand there were changes in how pandas handles dates, so I tried to use tshift - does not work and complains about 'freq not set'. It indeed is not (the frame is constructed from the h5 file that contains dates,identifiers and other data):

In [3]: testDB.index.levels[0]
Out[3]: DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
                 '2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12',
           '2016-01-13', '2016-01-14',
           ...
           '2016-11-17', '2016-11-18', '2016-11-21', '2016-11-22',
           '2016-11-23', '2016-11-24', '2016-11-25', '2016-11-28',
           '2016-11-29', '2016-11-30'],
          dtype='datetime64[ns]', name=u'date', length=239, freq=None)

so I try to reset index to DateTimeIndex with proper frequency set up.

In [4]: i = testDB.index.set_levels(
             pd.DatetimeIndex(fullFrame.index.levels[0],freq='B'),
             level=0)

        i.levels[0]
Out[4]:
DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
           '2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12',
           '2016-01-13', '2016-01-14',
           ...
           '2016-11-17', '2016-11-18', '2016-11-21', '2016-11-22',
           '2016-11-23', '2016-11-24', '2016-11-25', '2016-11-28',
           '2016-11-29', '2016-11-30'],
          dtype='datetime64[ns]', name=u'date', length=239, freq='B')

Now frequency is set, but I cannot replace the index:

In [5]:y = x.reindex(index=i,level=0)
       y.index.levels[0]

Out[5]:
DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
           '2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12',
           '2016-01-13', '2016-01-14',
           ...
           '2016-11-17', '2016-11-18', '2016-11-21', '2016-11-22',
           '2016-11-23', '2016-11-24', '2016-11-25', '2016-11-28',
           '2016-11-29', '2016-11-30'],
          dtype='datetime64[ns]', name=u'date', length=239, freq=None)

So finally (apologies for the long intro, I wanted to provide full data) two questions:

  1. How does one do shift on a massive multi-indexed frame in Pandas 18?
  2. Why am I not able to reset an index to the newly created one?
Julien Marrec
  • 11,605
  • 4
  • 46
  • 63
hfquant
  • 11
  • 1
  • Please read [how to create a mcve](http://stackoverflow.com/help/mcve) and [how to make good reproducible pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Julien Marrec Dec 08 '16 at 15:10

0 Answers0