1

Took a look at these and they don't really answer my question:

I have a large DataFrame, and would like to calculate pct_change() for each column. I would like the result to skip nan values so that it's equivalent to this:

for i in df.columns:
    _pct = df[i].dropna().pct_change()

And then for rows where the originally value was nan, the pct_change() value will also be nan. In other words, if I have:

price = [1, nan, 1.1, 1.155, nan, 1.0395]

I want:

pct_change = [nan, nan, 0.1, 0.05, nan, -0.1]

Is there a built in way to do this or do I have to create my own loop to do it, i.e. loop through each column, do pct change, then replace values with nan?

Thanks

confused
  • 194
  • 1
  • 13

1 Answers1

2

I think you can use pct_change with the nan in the columns then mask these cells so something like

df.pct_change().where(df.notna())

with an example it gives:

nan = np.nan
df = pd.DataFrame({
    'price': [1, nan, 1.1, 1.155, nan, 1.0395],
    'price2': [1, nan, nan, 1.155, nan, 1.0395]
})
print(df.pct_change().where(df.notna()))
#    price  price2
# 0    NaN     NaN
# 1    NaN     NaN
# 2   0.10     NaN
# 3   0.05   0.155
# 4    NaN     NaN
# 5  -0.10  -0.100
Ben.T
  • 29,160
  • 6
  • 32
  • 54