I have a timeseries df comprised of daily Interest Rates points in column A ( IR aka Interest Rate) and the relative change from one day to the next in column B ( Rel Shift).
DF looks something like the below:
IR Shift
May/30/2019 5.9% 1.67%
May/29/2019 6% 1.69%
**May/28/2019 5.9% -292%**
May/27/2019 20.2% -1.4%
May/26/2019 20.5% 2.5%
**May/25/2019 20% 292%**
May/24/2019 5.1% -
My df shape is 4000x 2 and these large percentage changes are persistent throughout the df. In other words, the rates spike up, level off - and then at some point spike down again..or vice versa. So in the above IR changed 292%, leveled off and then spiked down 292%
Goal is to convert values lying between spikes above >50% to np.nan, inclusive of the date the spike occurred - effectively deleting those data points so that I can impute the them with more sensible data - in the 5%-6% range....
So the DF should look something like this:
IR Shift
May/30/2019 5.9% 1.67%
May/29/2019 6% 1.69%
May/28/2019 np.nan
May/27/2019 np.nan
May/26/2019 np.nan
May/25/2019 np.nan
May/24/2019 5.1% -
This is what i have so far for the loop --
for i in df1:
if df1['col_B'][i] > .50:
df1['col_A'][i] = np.nan