1

I have the following data file.

 1 3
 2 6
 3 7
 4 6
 5 8
 6 4
 7 5
 8 9
 9 7
 10 2
 11 3
 12 5
 13 3

My goal is to have a count of items that are equal to or greater than 5 in column 2 which appear at least 3 times in succession. I have been able to figure out the counting part but not the succession part.

So, I want the output of this data file to be 2 as in column 2 there are 2 strings (6,7,6,8) and (5,9,7) where I have numbers that are equal to and greater than 5 appearing at least 3 times in succession.

import numpy as np
data=np.loadtxt('/Users/Hrihaan/Desktop/DataF.txt')
z=data[:,1]
count = len([i for i in z if i >= 5])
print(count)

Any help would be greatly appreciated.

Antimony
  • 2,230
  • 3
  • 28
  • 38
Hrihaan
  • 275
  • 5
  • 21
  • Count of groups you mean maybe? Why not add the expected output? Also, is the posted code getting you the correct result? Also, would be nicer to have just the second column posted as the sample input array. – Divakar Sep 26 '17 at 17:18
  • Yes Divakar,count of groups. The posted code is giving me counts of numbers that are equal to or greater than 5 but I am stuck at the 3 times in succession part. – Hrihaan Sep 26 '17 at 17:22

3 Answers3

4

Approach #1 : Getting the start, stop indices for each valid group and getting the counts of them -

mask = np.concatenate(([False], ar>=5, [False] ))
idx = np.flatnonzero( np.concatenate(([False], mask[1:] != mask[:-1], [False] )) )
count = ((idx[1::2]-idx[::2])>=3).sum()

Approach #2 : Using 1D convolution -

mask = np.convolve(ar>=5,[1]*3)>=3
out = (mask[1:] > mask[:-1]).sum()
Divakar
  • 218,885
  • 19
  • 262
  • 358
3

Here's a pure Python approach using csv and itertools.groupby:

First, let me fake the file:

>>> s = """1 3
... 2 6
... 3 7
... 4 6
... 5 8
... 6 4
... 7 5
... 8 9
... 9 7
... 10 2
... 11 3
... 12 5
... 13 3"""
>>> import io

Now, for the meat of it:

>>> import itertools
>>> import csv
>>> with io.StringIO(s) as f:
...     reader = csv.reader(f, delimiter=' ')
...     second_col = (int(c) for _, c in reader)
...     gb = itertools.groupby(second_col, (5).__le__)
...     x = sum(k for k, g in gb if k and len(list(g)) >= 3)
...
>>> x
2
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
-1

You could loop through the column, check and keep a running count.

start = 0 # Keeps track of first number >= 5
count = 0

for i in z:
    if i >= 5:
        start += 1
    elif i < 5 and start > 2: # Checks if there were at least 3 in succession, and if the series has ended
        count += 1
        start = 0 # Reset start

print count
Antimony
  • 2,230
  • 3
  • 28
  • 38