Using shift() with unevenly spaced data

Question

Hopefully this example speaks for itself. I want to create 'lagval' with shift() but need it to be nan if the prior year is missing.

df = DataFrame( { 'yr' : [2007,2008,2009,2011,2012],
                  'val': np.random.randn(5) } )

Desired output (lagval):

In [1118]: df
Out[1118]: 
        val    yr    lagval
0 -0.978139  2007       NaN
1  0.117912  2008 -0.978139
2 -1.031884  2009  0.117912
3  0.606856  2011       NaN
4 -0.200864  2012  0.606856

I have a decent solution for this (posted as an answer), but am looking for alternatives. I have spent some time looking at all the time series functions but that seems like overkill here. It seems like I would end up converting year to a true timestamp, resampling, shifting, and then dropping missing values. But maybe there is a simpler way?

score 3 · Accepted Answer · edited Nov 18 '17 at 14:55

3

For what it's worth, here's a time-series solution, which obviously takes a bit more code.

df = df.set_index(df['yr'].apply(lambda x: datetime.datetime(x, 1, 1)))
df = df.resample('A').mean()

df['lagval'] = df['val'].shift(1)
df = df[pd.notnull(df['yr'])]

I'm not familiar with Stata, but just skimming the docs, it sounds like tsset does something similar (conforming the data to a specified frequency)?

edited Nov 18 '17 at 14:55

JohnE

29,156
8
79
109

answered Aug 19 '14 at 20:18

chrisb

49,833
8
70
70

Thanks, it's great to see that way and isn't as bad as I thought. Stata's tsset is just for defining the group and time variables for a panel data set. After which you can lag/reshape/etc and stata knows how to handle things. Kinda like setting the index before stacking in pandas. – JohnE Aug 19 '14 at 22:32

Using shift() with unevenly spaced data

1 Answers1

Linked