1

A question was posted on the link below where one wanted to use a previous row to populate the current row:

Is there a way in Pandas to use previous row value in dataframe.apply when previous value is also calculated in the apply?

In this case, there was only one index, the date.

Now I want to add a second index, employee ID; on the first occurrence of the first index, Index_EmpID, then I would like B to be populated with a value from A. On any subsequent occurrence, I would like the value from the previous row multiplied by the value from the current row.

I have the following data frame:

|Index_EmpID |Index_Date |   A  | B   |
|============|===========|======|=====|
|A123        |2022-01-31 |   1  | NaN |
|A123        |2022-02-28 |   1  | NaN |
|A123        |2022-03-31 | 1.05 | NaN |
|A123        |2022-04-30 |   1  | NaN |
|A567        |2022-01-31 |   1  | NaN |
|A567        |2022-02-28 | 1.05 | NaN |
|A567        |2022-03-31 |   1  | NaN |
|A567        |2022-04-30 | 1.05 | NaN |     

I require:

|Index_EmpID |Index_Date |   A  |  B   |
|============|===========|======|======|
|A123        |2022-01-31 |   1  |  1   |
|A123        |2022-02-28 |   1  |  1   |
|A123        |2022-03-31 | 1.05 | 1.05 |
|A123        |2022-04-30 |   1  | 1.05 |
|A567        |2022-01-31 |   1  |  1   |
|A567        |2022-02-28 | 1.05 | 1.05 |
|A567        |2022-03-31 |   1  | 1.05 |
|A567        |2022-04-30 | 1.05 |1.1025|     
Ketso
  • 11
  • 2

2 Answers2

2

Something like

import numpy as np
df.groupby("Index_EmpID")["A"].agg(np.cumprod).reset_index()

should work.

ags29
  • 2,621
  • 1
  • 8
  • 14
0

A solution that uses iterrows is not as nice a solution as the one that uses groupby but it follows directly from the description and uses only the most elementary Pandas facilities.

empdf = pd.DataFrame({'Index_EmpID': (['A123']*4 + ['A567']*4),
                      'Index_Date': (['2022-01-31', '2022-02-28',
                                      '2022-03-31', '2022-04-30'] * 2),
                      'A': [1, 1, 1.05, 1, 1, 1.05, 1, 1.05], 
                      'B': ([np.nan]*8)})

past_id, past_b, bs = None, 1, []
for label, row in empdf.iterrows():
     if row['Index_EmpID'] == past_id:
        bs.append(past_b * row['A'])
     else:
        bs.append(row['A'])
     past_b = bs[-1]
     past_id = row['Index_EmpID']

empdf['B'] = bs

This would produce exactly the dataframe you requested

  Index_EmpID  Index_Date     A       B
0        A123  2022-01-31  1.00  1.0000
1        A123  2022-02-28  1.00  1.0000
2        A123  2022-03-31  1.05  1.0500
3        A123  2022-04-30  1.00  1.0500
4        A567  2022-01-31  1.00  1.0000
5        A567  2022-02-28  1.05  1.0500
6        A567  2022-03-31  1.00  1.0500
7        A567  2022-04-30  1.05  1.1025
Dima Chubarov
  • 16,199
  • 6
  • 40
  • 76