I've got a dataset with multiple missing sequences of varying lengths where I'd like to find the first valid numbers that occur before and after these sequences for some particular dates. In the sample dataset below, I would like to find the valid numbers for ColumnB
that occur closest to the date 2018-11-26
.
Datasample:
Date ColumnA ColumnB
2018-11-19 107.00 NaN
2018-11-20 104.00 NaN
2018-11-21 106.00 NaN
2018-11-22 105.24 80.00
2018-11-23 104.63 NaN
2018-11-26 104.62 NaN
2018-11-28 104.54 NaN
2018-11-29 103.91 86.88
2018-11-30 103.43 NaN
2018-12-01 106.13 NaN
2018-12-02 110.83 NaN
Expected output:
[80, 86.88]
Some details:
If it were the case that this particular sequence was the only one with missing values, I would have been able to solve it using For Loops
, or the pandas functions first_valid_index()
or isnull()
as described in Pandas - find first non-null value in column, but that will rarely be the case.
I'm able to solve this using a few For Loops
, but it's very slow for larger datasets and not very elegant, so I'd really like to hear other suggestions!