14

I want to find the start position of the longest sequence of 1's in my array:

a1=[0,0,1,1,1,1,0,0,1,1]
#2

I am following this answer to find the length of the longest sequence. However, I was not able to determine the position.

Community
  • 1
  • 1
MAS
  • 4,503
  • 7
  • 32
  • 55

8 Answers8

15

Inspired by this solution, here's a vectorized approach to solve it -

# Get start, stop index pairs for islands/seq. of 1s
idx_pairs = np.where(np.diff(np.hstack(([False],a1==1,[False]))))[0].reshape(-1,2)

# Get the island lengths, whose argmax would give us the ID of longest island.
# Start index of that island would be the desired output
start_longest_seq = idx_pairs[np.diff(idx_pairs,axis=1).argmax(),0]

Sample run -

In [89]: a1 # Input array
Out[89]: array([0, 0, 1, 1, 1, 1, 0, 0, 1, 1])

In [90]: idx_pairs # Start, stop+1 index pairs
Out[90]: 
array([[ 2,  6],
       [ 8, 10]])

In [91]: np.diff(idx_pairs,axis=1) # Island lengths
Out[91]: 
array([[4],
       [2]])

In [92]: np.diff(idx_pairs,axis=1).argmax() # Longest island ID
Out[92]: 0

In [93]: idx_pairs[np.diff(idx_pairs,axis=1).argmax(),0] # Longest island start
Out[93]: 2
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
3

A more compact one-liner using groupby(). Uses enumerate() on the raw data to keep the starting positions through the analysis pipeline, evenutally ending up with the list of tuples [(2, 4), (8, 2)] each tuple containing the starting position and length of non-zero runs:

from itertools import groupby

L = [0,0,1,1,1,1,0,0,1,1]

print max(((lambda y: (y[0][0], len(y)))(list(g)) for k, g in groupby(enumerate(L), lambda x: x[1]) if k), key=lambda z: z[1])[0]

lambda: x is the key function for groupby() since we enumerated L

lambda: y packages up results we need since we can only evaluate g once, without saving

lambda: z is the key function for max() to pull out the lengths

Prints '2' as expected.

cdlane
  • 40,441
  • 5
  • 32
  • 81
2

This seems to work, using groupby from itertools, this only goes through the list once:

from itertools import groupby

pos, max_len, cum_pos = 0, 0, 0

for k, g in groupby(a1):
    if k == 1:
        pat_size = len(list(g))
        pos, max_len = (pos, max_len) if pat_size < max_len else (cum_pos, pat_size)
        cum_pos += pat_size
    else:
        cum_pos += len(list(g))

pos
# 2
max_len
# 4
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

You could use a for loop and check if the next few items (of length m where m is the max length) are the same as the maximum length:

# Using your list and the answer from the post you referred
from itertools import groupby
L = [0,0,1,1,1,1,0,0,1,1]
m = max(sum(1 for i in g) for k, g in groupby(L))
# Here is the for loop
for i, s in enumerate(L):
    if len(L) - i + 2 < len(L) - m:
        break
    if s == 1 and 0 not in L[i:i+m]:
        print i
        break

This will give:

2
Moon Cheesez
  • 2,489
  • 3
  • 24
  • 38
1

Another way of doing in a single loop, but without resorting to itertool's groupby.

max_start = 0
max_reps = 0
start = 0
reps = 0
for (pos, val) in enumerate(a1):
    start = pos if reps == 0 else start
    reps = reps + 1 if val == 1 else 0
    max_reps = max(reps, max_reps)
    max_start = start if reps == max_reps else max_start

This could also be done in a one-liner fashion using reduce:

max_start = reduce(lambda (max_start, max_reps, start, reps), (pos, val): (start if reps == max(reps, max_reps) else max_start, max(reps, max_reps), pos if reps == 0 else start, reps + 1 if val == 1 else 0), enumerate(a1), (0, 0, 0, 0))[0]

In Python 3, you cannot unpack tuples inside the lambda arguments definition, so it's preferable to define the function using def first:

def func(acc, x):
    max_start, max_reps, start, reps = acc
    pos, val = x
    return (start if reps == max(reps, max_reps) else max_start,
            max(reps, max_reps),
            pos if reps == 0 else start,
            reps + 1 if val == 1 else 0)

max_start = reduce(func, enumerate(a1), (0, 0, 0, 0))[0]

In any of the three cases, max_start gives your answer (i.e. 2).

Douglas Vieira
  • 183
  • 1
  • 7
0

Using more_itertools, a third-party library:

Given

import itertools as it

import more_itertools as mit


lst = [0, 0, 1, 1, 1, 1, 0, 0, 1, 1]

Code

longest_contiguous = max([tuple(g) for _, g in it.groupby(lst)], key=len)
longest_contiguous    
# (1, 1, 1, 1)

pred = lambda w: w == longest_contiguous
next(mit.locate(mit.windowed(lst, len(longest_contiguous)), pred=pred))
# 2

See also the more_itertools.locate docstring for details on how these tools work.

pylang
  • 40,867
  • 14
  • 129
  • 121
0

For another solution that uses only Numpy, I think this should work in all the cases. The most upvoted solution is probably faster though.

tmp = np.cumsum(np.insert(np.array(a1) != 1, 0, False))  # value of tmp[i+1] was not incremented when a1[i] is 1
# [0, 1, 2, 2, 2, 2, 2, 3, 4, 4, 4]

values, counts = np.unique(tmp, return_counts=True)
# [0, 1, 2, 3, 4], [1, 1, 5, 1, 3]

counts_idx = np.argmax(counts)
longest_sequence_length = counts[counts_idx] - 1
# 4

longest_sequence_idx = np.argmax(tmp == values[counts_idx])
# 2
yakoudbz
  • 903
  • 1
  • 6
  • 14
0

I've implemented a run-searching function for numpy arrays in haggis.npy_util.mask2runs. You can use it like this:

runs, lengths = mask2runs(a1, return_lengths=True)
result = runs[lengths.argmax(), 0]
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264