1

I have a data frame with a variable "A" and I would like to create a rolling Nan checker, such that the new variable "rolling_nan" = 1 if ALL 3 (seconds) cells (current cell and the two previous ones) are NaN, else "rolling_nan" = 0.

I am applying a function since the .rolling pandas function does not support isna(). However I am getting the following. Also I am not sure how to do include the same row value in the NaN checker.

import pandas as pd
import numpy as np

idx = pd.date_range('2018-01-01', periods=10, freq='S')
df = pd.DataFrame({"A":[1,2,3,np.nan,np.nan,np.nan,6,7,8,9]}, index = idx)
df

def isna_func(x):
    return 1 if pd.isna(x).all() == True else 0
df['rolling_nan'] = df['A'].rolling(3).apply(isna_func)
df

                    A   rolling_nan
2018-01-01 00:00:00 1.0 NaN
2018-01-01 00:00:01 2.0 NaN
2018-01-01 00:00:02 3.0 0.0
2018-01-01 00:00:03 NaN NaN
2018-01-01 00:00:04 NaN NaN
2018-01-01 00:00:05 NaN NaN
2018-01-01 00:00:06 6.0 NaN
2018-01-01 00:00:07 7.0 NaN
2018-01-01 00:00:08 8.0 0.0
2018-01-01 00:00:09 9.0 0.0

In the above example, the rolling_nan should be equal to 1 only at timestamp 2018-01-01 00:00:05 and 0 otherwise.

finstats
  • 1,349
  • 4
  • 19
  • 31

1 Answers1

1

You can think in the different way mark all notna , and find the max

df.A.notna().rolling(3).max()==0
Out[316]: 
2018-01-01 00:00:00    False
2018-01-01 00:00:01    False
2018-01-01 00:00:02    False
2018-01-01 00:00:03    False
2018-01-01 00:00:04    False
2018-01-01 00:00:05     True
2018-01-01 00:00:06    False
2018-01-01 00:00:07    False
2018-01-01 00:00:08    False
2018-01-01 00:00:09    False
Freq: S, Name: A, dtype: bool

Assign it back

df['rollingnan']=(df.A.notna().rolling(3).max()==0).astype(int)
df
Out[320]: 
                       A  rollingnan
2018-01-01 00:00:00  1.0           0
2018-01-01 00:00:01  2.0           0
2018-01-01 00:00:02  3.0           0
2018-01-01 00:00:03  NaN           0
2018-01-01 00:00:04  NaN           0
2018-01-01 00:00:05  NaN           1
2018-01-01 00:00:06  6.0           0
2018-01-01 00:00:07  7.0           0
2018-01-01 00:00:08  8.0           0
2018-01-01 00:00:09  9.0           0

Or base on your own idea using all

df['A'].isna().rolling(3).apply(lambda x : x.all(),raw=True)
Out[323]: 
2018-01-01 00:00:00    NaN
2018-01-01 00:00:01    NaN
2018-01-01 00:00:02    0.0
2018-01-01 00:00:03    0.0
2018-01-01 00:00:04    0.0
2018-01-01 00:00:05    1.0
2018-01-01 00:00:06    0.0
2018-01-01 00:00:07    0.0
2018-01-01 00:00:08    0.0
2018-01-01 00:00:09    0.0
Freq: S, Name: A, dtype: float64
BENY
  • 317,841
  • 20
  • 164
  • 234
  • thank you for your answer. In case I wanted to create a column with the last non-NaN in the rolling function, how can I adjust the function, since the rolling function also does not support ``.last()``. So using the above example, the new column would show at ``2018-01-01 00:00:03 `` and ``2018-01-01 00:00:04 `` the last valid value which is ``3`` and at ``2018-01-01 00:00:05 `` it would be ``NaN``. – finstats Mar 27 '19 at 20:31
  • @roland https://stackoverflow.com/questions/40101130/how-do-i-calculate-a-rolling-idxmax – BENY Mar 27 '19 at 20:35
  • In the link you provided, they are searching for the max value, In my case I was the last value, which may not be the max value. – finstats Mar 28 '19 at 20:36