1

example Datafrmae:

col1 = {"a" : [1, np.nan, np.nan, np.nan, 2, 3, 4 , np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,]}
df = pd.DataFrame(data=col1)
  1. to find the largest group of consecutive NaN values from the dataframe.
  2. To return the first and last index from the group

in this example the output would be: (7, 12)

mika
  • 173
  • 2
  • 16

3 Answers3

2

Use groupby on null values to get the "streaks" of consecutive null/non-null values:

streaks = df["a"].isnull().groupby(df["a"].isnull().ne(df["a"].isnull().shift()).cumsum()).transform(sum)
indices = df[streaks==streaks.max()].index

>>> indices[0], indices[-1]
(7, 12)
not_speshal
  • 22,093
  • 2
  • 15
  • 30
0

What about:

# Fill NAN with some dummy value.
df.a = df.a.fillna(-999)
# Check if value is equal to last value, and add a value each time group changes:
df['groups'] = (df.a != df.a.shift()).cumsum()
# Count the size of each group and select the biggest:
biggest_group = df.groupby('groups').size().max()
# Query that group and fetch first and last index:
min_index, *_, max_index = df[df['groups'] == biggest_group].index
hirolau
  • 13,451
  • 8
  • 35
  • 47
0

We can filter the dataframe for NaN values and use the resulting index as the numerical sequence for this answer (slightly adapted): identify groups of continuous numbers in a list.

Example:

from itertools import groupby
from operator import itemgetter

ranges = {}
for k, g in groupby(enumerate(df.loc[df['a'].isna()].index), lambda x: x[0] - x[1]):
    seq = tuple(map(itemgetter(1), g))
    ranges[seq] = len(seq)
print(ranges)

Output:

{(1, 2, 3): 3, (7, 8, 9, 10, 11, 12): 6}

The resulting ranges dictionary has a tuple of indices where the NaN values appeared in df as keys, and the length of each consecutive range as values.

jfaccioni
  • 7,099
  • 1
  • 9
  • 25