1

I have a pandas dataframe with time index and want to normalize every row of a column by the maximum value observed to that date and time.

# an example input df
rng = pd.date_range('2020-01-01', periods=8)
a_lst = [2, 4, 3, 8, 2, 4, 10, 2]
df = pd.DataFrame({'date': rng, 'A': a_lst})
df.set_index('date', inplace=True, drop=True)

enter image description here

(a possible solution is to iterate over the rows, subset the past rows,and then divide by the max [1,2,3], but it would be inefficient)

Reveille
  • 4,359
  • 3
  • 23
  • 46

1 Answers1

2

you are looking at cummax:

df['A_normalized'] = df['A']/df['A'].cummax()

Output:

             A  A_normalized
date                        
2020-01-01   2          1.00
2020-01-02   4          1.00
2020-01-03   3          0.75
2020-01-04   8          1.00
2020-01-05   2          0.25
2020-01-06   4          0.50
2020-01-07  10          1.00
2020-01-08   2          0.20
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74