I want to difference time series to make it stationary. However it is not guaranteed that by taking first lag would make time series stationary. Generate an example Pandas dataframe as below
test = {'A':[10,15,19,24,23]}
test_df = pd.DataFrame(test)
by using diff()
method we can take first lag as expected but if I attempt diff(2)
i.e. if I want to use a lag period of 2 I am not getting results as expected.
Expected Output
+----+-------+-------+
| A | Lag 1 | Lag 2 |
+----+-------+-------+
| 10 | NA | NA |
| 15 | 5 | NA |
| 19 | 4 |-1 |
| 24 | 5 | 1 |
| 23 |-1 |-6 |
+----+-------+-------+
Resulting Output
+----------------+
| A lag1 lag2 |
+----------------+
| 10 NaN NaN |
| 15 5.0 NaN |
| 19 4.0 9.0 |
| 24 5.0 9.0 |
| 23 -1.0 4.0 |
+----------------+
The above output was generated using test_df['lag2'] = test_df['A'].diff(2)
.
How can I obtain the expected output and regenerate the actual time series by only using the Lag 2
time series?
Edit 1 This question does not pertains to any data type conversion or NaNs and is incorrectly marked as duplicate. The expected output is clearly mentioned and the scope of question is completely different from one mentioned here.
Edit 2 To work on more number of samples following dummy data frame can be used.
test = np.random.randint(100, size=500)
test_df = pd.DataFrame(test, columns = ['A'])
Edit 3 In order to explain the expected output more please consider the expected output below.
+----+-------+-------+
| A | Lag 1 | Lag 2 |
+----+-------+-------+
| 10 | NA | NA |
| 15 | 5 | NA |
| 19 | 4 | -1 |
| 24 | 5 | 1 |
| 23 | -1 | -6 |
| 50 | 27 | 28 |
| 34 | -16 | -43 |
| 56 | 22 | 38 |
| 33 | -23 | -45 |
| 26 | -7 | 16 |
| 45 | 19 | 26 |
+----+-------+-------+
test = {'A': [10,15,19,24,23,50,34,56,33,26,45]}
test_df = pd.DataFrame(test)
Lag 1
of this column can be created using
test_df['lag1'] = test_df['A'].diff()
. But to create lag 2
I need to do test_df['lag2'] = test_df['A'].diff().diff()
. This solution won't work in case where I've to take 365 lags. Hence I need a solution takes lag of original series A
and then recursively takes lag of lag1
to generate lag2
and so on and so forth.
Once we've created lagged term lag2
how can we retrieve the original series back from it?