I want to use bfill
on a pandas dataframe but I want the value to use for each backfill to be dependant on the values in the row.
Example input:
type val
2018-12-31 H 1
2019-03-31 NaN NaN
2019-06-30 Q 2
2019-07-31 NaN NaN
2019-08-31 H 3
2019-09-30 Y 4
2019-12-31 Q 5
Expected output:
type val
2018-12-31 H 1
2019-03-31 Q 2 <-- Same as 2019-06-30
2019-06-30 Q 2
2019-07-31 Q 6 <-- Double 2019-08-31
2019-08-31 H 3
2019-09-30 Y 4
2019-12-31 Q 5
In this example, the backfilled value for 2019-07-31
is 6 because it has a H
type, i.e. it's double the (2019-08-31, H)
value. On the other hand, the backfilled value for 2019-03-31
is the same as the next row since that type is Q
.
Rules:
- Type
H
: double the value for backfill - Type
Q
andY
: keep the value for backfill - All types: Set type to
Q
I could not find any straightforward built in way of doing this. I need to do this on a very large dataframe so speed is important to me, and it's why I can't loop.