Replace column with some rows of another column

Question

I have following dataframe:

midPrice    Change %  Spike  New Oilprice
92.20000    0.00      0      92.043405 
92.26454    0.07      0      92.049689
91.96950    -0.32     0      91.979751
91.73958    -0.25     0      91.844369
91.78985    0.05      0      91.724690
91.41000    -0.41     0      91.568880
91.18148    -0.25     0      91.690812
91.24257    0.07      0      91.858391
90.95352    -0.32     0      92.016806 
93.24000    2.51      1      92.139872
93.31013    0.08      0      92.321622
93.00690    -0.32     0      92.542687
92.77438    -0.25     0      92.727070
92.86400    0.10      0      92.949655

and whenever I have a Spike (1) in the column, I want to replace the 5 rows after the spike (including) with the new oil prices. The rest of the rows are being kept as they are.

Any ideas how to solve that? I tried the code based on following:

Iterate through the df (for loop)
If/else statement if spike == 1 then replace following 5 rows with values of new oil prices / else: keep oil prices

def spike(i):   
 
    for i in df['Spike']: 
         if i.loc == 1: 
              df['midPrice'].replace(df['New Oilprice'][i:5])`

It unfortunately doesn't work and I\m not so strong with pandas. I tried mapping the function as well on the dataframe which didn't work either. I would appreciate any help

Please, post an example dataframe as text (and the expected output) instead of a screenshot — user2246849, May 17 '22 at 09:40
Even if better, that is still a screenshot. You can really just copy the dataframe content as text and paste it in the question inside a code block. Also [see here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — user2246849, May 17 '22 at 09:49
What should happen if there are two spikes within 5 five days? Does the first spike replace the next five rows, but then the second spike uses 1/ its old value, or 2/ its new, replaced value, for the next five rows? Or something else entirely? — 9769953, May 17 '22 at 09:49
If you use the output of `f"DataFrame({df.to_dict()})` (minus the surrounding quotation marks and neatly formatted with line breaks), we can easily copy-paste the example dataframe. — 9769953, May 17 '22 at 09:51

user2246849 · Accepted Answer · 2022-05-17T10:40:47.563

Assuming the df is sorted by time in ascending order (as I've seen in the edit history of your question that you had a time column), you could use a mask like so:

mask = df['Spike'].eq(1).where(df['Spike'].eq(1)).fillna(method='ffill', limit=4).fillna(False)
df.loc[mask, 'midPrice'] = df['New Oilprice']

print(df)

     midPrice  Change %  Spike  New Oilprice
0   92.200000      0.00      0     92.043405
1   92.264540      0.07      0     92.049689
2   91.969500     -0.32      0     91.979751
3   91.739580     -0.25      0     91.844369
4   91.789850      0.05      0     91.724690
5   91.410000     -0.41      0     91.568880
6   91.181480     -0.25      0     91.690812
7   91.242570      0.07      0     91.858391
8   90.953520     -0.32      0     92.016806
9   92.139872      2.51      1     92.139872
10  92.321622      0.08      0     92.321622
11  92.542687     -0.32      0     92.542687
12  92.727070     -0.25      0     92.727070
13  92.949655      0.10      0     92.949655

EDIT - 2 rows before, 3 rows after:

You can adjust the mask with another fillna:


mask = df['Spike'].eq(1).where(df['Spike'].eq(1)).fillna(method='bfill', limit=2).fillna(method='ffill', limit=3).fillna(False)
df.loc[mask, 'midPrice'] = df['New Oilprice']

print(df)

     midPrice  Change %  Spike  New Oilprice
0   92.200000      0.00      0     92.043405
1   92.264540      0.07      0     92.049689
2   91.969500     -0.32      0     91.979751
3   91.739580     -0.25      0     91.844369
4   91.789850      0.05      0     91.724690
5   91.410000     -0.41      0     91.568880
6   91.181480     -0.25      0     91.690812
7   91.858391      0.07      0     91.858391
8   92.016806     -0.32      0     92.016806
9   92.139872      2.51      1     92.139872
10  92.321622      0.08      0     92.321622
11  92.542687     -0.32      0     92.542687
12  92.727070     -0.25      0     92.727070
13  92.949655      0.10      0     92.949655

Perfect thank you! That works. Can I also adjust it to make it 2 rows earlier and 3 rows later after the Spike? Would I do it with indexing? — Helene, May 17 '22 at 10:05
@Helene you mean like this? 2 rows + the spike row + 3 rows after? so 6 rows in total including the spike? — user2246849, May 17 '22 at 10:08
@Helene no problem! Ah, remember to sort the df by time (ascending). I will also add it to my answer — user2246849, May 17 '22 at 10:31

Replace column with some rows of another column

1 Answers1

EDIT - 2 rows before, 3 rows after: