I'm very confused by the output of the pct_change function when data with NaN values are involved. The first several rows of output in the right column are correct - it gives the percentage change in decimal form of the cell to the left in Column A relative to the cell in Column A two rows prior. But as soon as it reaches the NaN values in Column A, the output of the pct_change function makes no sense.
For example:
Row 8: NaN is 50% greater than 2?
Row 9: NaN is 0% greater than 3?
Row 11: 4 is 33% greater than NaN?
Row 12: 2 is 33% less than NaN?`
Based on the above math, it seems like pct_change is assigning NaN a value of "3". Is that because pct_change effectively fills forward the last non-NaN value? Could someone please explain the logic here and why this happens?
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [2,1,3,1,4,5,2,3,np.nan,np.nan,np.nan,4,2,1,0,4]})
x = 2
df['pctchg_A'] = df['A'].pct_change(periods = x)
print(df.to_string())
Here's the output: