1

I am pretty new to python (mostly I use R) and I would like to perform a simple calculation but keep getting errors and incorrect results. I would like to calculate the percentage change for a column in a pandas df using the latest non-na value. A toy example is below.

price = ['Nan', 10, 13, 'NaN', 'NaN', 9]
df = pd.DataFrame(price, columns = ['price'])
df['price_chg'] = df.price.pct_change(periods = -1)

I keep getting a weird result:

price_chg = [Nan, -0.2307, 0, 0, 0.4444, NaN] 

I guess this has to do with the Nan values. How do I tell python to use the latest non-na value. The desired result is as follows:

price_chg = [Nan, -0.2307, 0.4444, 0, 0, NaN]

Since I don't know very much python at all, any suggestions would be welcome, even more convoluted ones.

jvalenti
  • 604
  • 1
  • 9
  • 31

1 Answers1

1

I believe what you're looking for is to employ backfill when calling the pct_change function.

df['price_chg'] = df.price.pct_change(periods = -1, fill_method='backfill')

This results in:

1   -0.230769
2    0.444444
3    0.000000
4    0.000000
5         NaN

This page describes the options you have when calling pct_change, including the fill_method. You can learn more about the fill methods available in pandas here

Matthew Cox
  • 1,047
  • 10
  • 23