5

I have a times series with some missing entries, that looks like this:

date     value
---------------
2000       5
2001      10
2003      8
2004      72
2005      12
2007      13

I would like to do create a column for the "previous_value". But I only want it to show values for consecutive years. So I want it to look like this:

date     value    previous_value
-------------------------------
2000       5        nan
2001      10         5
2003      8         nan
2004      72         8
2005      12        72
2007      13        nan

However just applying pandas shift function directly to the column 'value' would give 'previous_value' = 10 for 'time' = 2003, and 'previous_value' = 12 for 'time' = 2007.

What's the most elegant way to deal with this in pandas? (I'm not sure if it's as easy as setting the 'freq' attribute).

Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
user3591836
  • 953
  • 2
  • 16
  • 29

2 Answers2

8
In [588]: df = pd.DataFrame({ 'date':[2000,2001,2003,2004,2005,2007],
                              'value':[5,10,8,72,12,13] })

In [589]: df['previous_value'] = df.value.shift()[ df.date == df.date.shift() + 1 ]

In [590]: df
Out[590]: 
   date  value  previous_value
0  2000      5             NaN
1  2001     10               5
2  2003      8             NaN
3  2004     72               8
4  2005     12              72
5  2007     13             NaN

Also see here for a time series approach using resample(): Using shift() with unevenly spaced data

Community
  • 1
  • 1
JohnE
  • 29,156
  • 8
  • 79
  • 109
1

Your example doesn't look like real time series data with timestamps. Let's take another example with the missing date 2020-01-03:

df = pd.DataFrame({"val": [10, 20, 30, 40, 50]},
                  index=pd.date_range("2020-01-01", "2020-01-05"))
df.drop(pd.Timestamp('2020-01-03'), inplace=True)

            val
2020-01-01   10
2020-01-02   20
2020-01-04   40
2020-01-05   50

To shift by one day you can set the freq parameter to 'D':

df.shift(1, freq='D')

Output:

            val
2020-01-02   10
2020-01-03   20
2020-01-05   40
2020-01-06   50

To combine original data with the shifted one you can merge both tables:

df.merge(df.shift(1, freq='D'),
         left_index=True,
         right_index=True,
         how='left',
         suffixes=('', '_previous'))

Output:

            val  val_previous
2020-01-01   10           NaN
2020-01-02   20          10.0
2020-01-04   40           NaN
2020-01-05   50          40.0

Other offset aliases you can find here

Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73