0

I know it is easy to check how many missing values are in a pandas series. What if I want to check if a Pandas Series has 6+ Continuous Missing Values Entries?

Zhang Yongheng
  • 125
  • 2
  • 10

2 Answers2

1
mask = temp_df.loc[:,i].isna()
max_missing_val = temp_df.loc[:,i][mask].groupby((~mask).cumsum()[mask]).agg(['size'])
if len(max_missing_val) == 0:
    max_missing_val = 0
else:
    max_missing_val = max_missing_val.max()[0]

Reference: Counting continuous nan values in panda Time series

Zhang Yongheng
  • 125
  • 2
  • 10
0

You can make use of cumsum to create groups of continuous NaNvalues:

s = pd.Series(
    [np.nan, 1, 2, np.nan, np.nan, np.nan, 3, 4, np.nan, np.nan]*2
)

# create groups of continuous na/non na values
group = s.isna().ne(s.shift().isna()).cumsum()

# set threshold for minimum group size, here 3 instead of 6
threshold = 3

group_size = s.groupby(group).transform('size')

# check for rows with 3+ continous NaN values
print(s[(group % 2 == 0) & (group_size.ge(threshold))])

# output

3    NaN
4    NaN
5    NaN
8    NaN
9    NaN
10   NaN
13   NaN
14   NaN
15   NaN
Anders Källmar
  • 366
  • 1
  • 4