Dataframe: find the largest consecutive group of equal numbers in a dataframe and get the first and last index of this group

Question

example Datafrmae:

col1 = {"a" : [1, np.nan, np.nan, np.nan, 2, 3, 4 , np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,]}
df = pd.DataFrame(data=col1)

to find the largest group of consecutive NaN values from the dataframe.
To return the first and last index from the group

in this example the output would be: (7, 12)

you may want to have your data imported like this instead: `df = pd.DataFrame.from_dict(col1)` — keithpjolley, Aug 26 '21 at 13:21

not_speshal · Accepted Answer · 2021-08-26T13:31:24.183

2

Use groupby on null values to get the "streaks" of consecutive null/non-null values:

streaks = df["a"].isnull().groupby(df["a"].isnull().ne(df["a"].isnull().shift()).cumsum()).transform(sum)
indices = df[streaks==streaks.max()].index

>>> indices[0], indices[-1]
(7, 12)

edited Aug 26 '21 at 13:31

answered Aug 26 '21 at 12:59

not_speshal

22,093
2
15
30

score 0 · Answer 2 · answered Aug 26 '21 at 12:55

What about:

# Fill NAN with some dummy value.
df.a = df.a.fillna(-999)
# Check if value is equal to last value, and add a value each time group changes:
df['groups'] = (df.a != df.a.shift()).cumsum()
# Count the size of each group and select the biggest:
biggest_group = df.groupby('groups').size().max()
# Query that group and fetch first and last index:
min_index, *_, max_index = df[df['groups'] == biggest_group].index

score 0 · Answer 3 · answered Aug 26 '21 at 13:01

We can filter the dataframe for NaN values and use the resulting index as the numerical sequence for this answer (slightly adapted): identify groups of continuous numbers in a list.

Example:

from itertools import groupby
from operator import itemgetter

ranges = {}
for k, g in groupby(enumerate(df.loc[df['a'].isna()].index), lambda x: x[0] - x[1]):
    seq = tuple(map(itemgetter(1), g))
    ranges[seq] = len(seq)
print(ranges)

Output:

{(1, 2, 3): 3, (7, 8, 9, 10, 11, 12): 6}

The resulting ranges dictionary has a tuple of indices where the NaN values appeared in df as keys, and the length of each consecutive range as values.

Dataframe: find the largest consecutive group of equal numbers in a dataframe and get the first and last index of this group

3 Answers3