2

In a pd.DataFrame I have one column A, I want to count how many times value 1 occurs consecutively three or more times.

df=pd.DataFrame({'A':[0,0,1,0,1,0,0,0,0,1,1,1,1,0,1,1,1]}) 

output:

df1=pd.DataFrame({'value_one_count_for three or more than three times':[2]}) 
Daniel F
  • 13,620
  • 2
  • 29
  • 55
Nickel
  • 580
  • 4
  • 19

3 Answers3

2

You need to perform groupby like itertools.groupby here, then take groups whose values are 1 since we counting for consecutive 1s. Then use GroupBy.count, take value greater than equal to 3

g = df['A'].ne(df['A'].shift()).cumsum()
g = g[df['A'].eq(1)]
g.groupby(g).count().ge(3).sum()
# 2
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
1

First filter by 1 for consecutive groups, so get consecutive 1 groups, then add Series.value_counts, compare for great or equal by Series.ge and count Trues by sum:

a = df['A'].ne(df['A'].shift()).cumsum()[df['A'].eq(1)].value_counts().ge(3).sum()
print (a)
2

Numpy alternative - compare consecutive counts for >= 3 and sum:

condition = df.A.eq(1).to_numpy()
#https://stackoverflow.com/a/24343375
a = np.sum(np.diff(np.where(np.concatenate(([condition[0]],
                                     condition[:-1] != condition[1:],
                                     [True])))[0])[::2] >= 3)
print (a)
2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

TLDR

In [1]: print(((df['A'] != 1).cumsum().loc[df['A'] == 1].value_counts() >= 3).sum())
2

Explanation

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'A':[0,0,1,0,1,0,0,0,0,1,1,1,1,0,1,1,1]})

The following is going to assign a unique ID to every set of consecutive 1...

In [3]: df['cumsum'] = (df['A'] != 1).cumsum()

In [4]: print(df)
    A  cumsum
0   0       1
1   0       2
2   1       2
3   0       3
4   1       3
5   0       4
6   0       5
7   0       6
8   0       7
9   1       7
10  1       7
11  1       7
12  1       7
13  0       8
14  1       8
15  1       8
16  1       8

... as long as you clear up by keeping only the rows where 'A' equals 1

In [5]: df = df[df['A'] == 1]

In [6]: print(df)
    A  cumsum
2   1       2
4   1       3
9   1       7
10  1       7
11  1       7
12  1       7
14  1       8
15  1       8
16  1       8

Then, you can either use value_counts() or groupby()

# With value_counts()

In [7]: print(df['cumsum'].value_counts())
7    4
8    3
3    1
2    1
Name: cumsum, dtype: int64

# The amount of sets of at least 3 consecutive 1 is:
In [8]: print((df['cumsum'].value_counts() >= 3).sum())
2



# With groupby()
In [9]: list(df.groupby('cumsum'))
Out[10]: 
[(2,
     A  cumsum
  2  1       2),
 (3,
     A  cumsum
  4  1       3),
 (7,
      A  cumsum
  9   1       7
  10  1       7
  11  1       7
  12  1       7),
 (8,
      A  cumsum
  14  1       8
  15  1       8
  16  1       8)]

# The amount of sets of at least 3 consecutive 1 is:
In [10]: print(len([dataframe for _, dataframe in df.groupby('cumsum') if len(dataframe) >= 3]))
2