2

I am working on a function that is generating a list of indexes based on a provided string. For instance [0, 1, 3, 4, 5, 6, 7, 14, 15, 16, 17, 19, 20, 26]. If the list contains at least five consecutive indexes, a new list should be created containing such indexes. For example, from the above list, the list [3, 4, 5, 6, 7] should be created.

flags = [0, 1, 3, 4, 5, 6, 7, 14, 15, 16, 17, 19, 20, 26]
IDs = []
count1 = 0
for i in range (len(flags)-1):
    if flags[i+1]-flags[i] == 1:
        IDs.append(i)
        count1 += 1
    else:
        count1 -= 1

count1 is 5, so IDs should also have five items.

Alan Alor
  • 25
  • 3
  • You don't remove anything from `IDs` when you decrement `count1`, so why would you expect that `count1` will always match the length of `IDs`? – Samwise Sep 16 '21 at 01:00
  • `flags` is always in sorted order, right? – wjandrea Sep 16 '21 at 01:18
  • Your title says "random numbers", but then your example list is sorted and has no duplicates. Doesn't look random to me at all. Is it always random and without duplicates? – no comment Sep 17 '21 at 05:36
  • Yes, always sorted, and random in the sense that it is not the same list every time. – Alan Alor Sep 18 '21 at 00:46

6 Answers6

1

You can use itertools.groupby with a key function that subtracts each number with an incremental counter, which would return a fixed value for each group of consecutive numbers:

from itertools import groupby, count

next(g for _, (*g,) in groupby(flags, lambda n, c=count(): n - next(c)) if len(g) >= 5)

This returns:

[3, 4, 5, 6, 7]
blhsing
  • 91,368
  • 6
  • 71
  • 106
0

Any time you break the streak, you need to completely clear the list. I suggest iterating over zip(flags, flags[1:]) as an easier way of comparing sequential items.

>>> flags = [0, 1, 3, 4, 5, 6, 7, 14, 15, 16, 17, 19, 20, 26]
>>> streak = []
>>> for x, y in zip(flags, flags[1:]):
...     if not streak:
...         streak.append(x)
...     if y == x + 1:
...         streak.append(y)
...     else:
...         streak = []
...     if len(streak) >= 5:
...         print(streak)
...
[3, 4, 5, 6, 7]

Another approach would be to simply append each item to the streak, clearing it first if the streak is broken (determined by looking at the last item in the current streak):

>>> streak = [flags[0]]
>>> for f in flags[1:]:
...     if streak[-1] != f - 1:
...         streak = []
...     streak.append(f)
...     if len(streak) >= 5:
...         print(streak)
...
[3, 4, 5, 6, 7]
Samwise
  • 68,105
  • 3
  • 30
  • 44
0

Using numpy:

import numpy as np

flags = [0, 1, 3, 4, 5, 6, 7, 14, 15, 16, 17, 19, 20, 26]
arr = np.r_[0, np.diff(flags) == 1, 0]
slices = np.nonzero(np.diff(arr))[0].reshape(-1, 2)
print(slices)

This produces a 2-dimensional array whose rows are pairs of indices showing where each subsequence of consecutive numbers begins and ends:

[[ 0  1]
 [ 2  6]
 [ 7 10]
 [11 12]]

We can then print each subsequence of consecutive nnumbers:

for s in slices:
    print(flags[s[0]: s[1]+1])

It gives:

[0, 1]
[3, 4, 5, 6, 7]
[14, 15, 16, 17]
[19, 20]

You can filter these subsequences, leaving only ones of a specified length.

bb1
  • 7,174
  • 2
  • 8
  • 23
0

There's a related question, Identify groups of continuous numbers in a list. Using a version of Nadia Alramli's solution, we can get all the subsequences of consecutive numbers, then filter out ones shorter than 5:

from itertools import groupby

IDs = []
# Number minus index gives offset. Repeated offsets mean consecutive numbers.
consecutive_numbers = groupby(enumerate(flags), lambda x: x[1]-x[0])
for _offset, g in consecutive_numbers:
    numbers = [x[1] for x in g]
    if len(numbers) >= 5:
        IDs.append(numbers)
print(IDs)

Output:

[[3, 4, 5, 6, 7]]

Edit: blhsing's answer is very similar to mine but a lot cleaner. Here's how it would look integrated with mine:

from itertools import groupby, count

indexes = count()
# Number minus index gives offset. Repeated offsets mean consecutive numbers.
consecutive_numbers = groupby(flags, lambda n: n-next(indexes))
IDs = [g for _offset, (*g,) in consecutive_numbers if len(g) >= 5]
print(IDs)

I hope this is easier to read, if nothing else.


I'm assuming flags is always in sorted order.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
0

If the indexes are in ascending order, the difference between the index value and its position (delta) in the flags list will be the same for all indexes that are consecutive. Using the Counter class (from collections) you can determine the most common delta and select flag entries that match it to get the IDs list:

flags = [0, 1, 3, 4, 5, 6, 7, 14, 15, 16, 17, 19, 20, 26]

from collections import Counter

delta,count = Counter(n-i for i,n in enumerate(flags)).most_common(1)[0]
IDs = [n for i,n in enumerate(flags) if n-i == delta]

print(count,IDs)
# 5 [3, 4, 5, 6, 7]

The above example gets the longest consecutive streak but you can use the Counter() object in various ways to find specific streak lengths or deltas from the original character positions.

If the indexes in flags are not always in ascending order, you can first split the flags into sublists (using itertools.groupby) and select the largest one:

from itertools import groupby
streaks = [[n for _,n in g]
           for _,g in groupby(enumerate(flags),lambda f:f[1]-f[0])]

print(streaks)
# [[0, 1], [3, 4, 5, 6, 7], [14, 15, 16, 17], [19, 20], [26]]

IDs   = max(streaks,key=len)
count = len(IDs)

print(IDs, count)
# [3, 4, 5, 6, 7] 5
Alain T.
  • 40,517
  • 4
  • 31
  • 51
0

Assuming your list is not just "random numbers" like your title says but rather sorted and without duplicates like your text and example suggest, you could simply look whether two indexes 4 apart differ by 4:

flags = [0, 1, 3, 4, 5, 6, 7, 14, 15, 16, 17, 19, 20, 26]

for i in range(len(flags) - 4):
    if flags[i+4] == flags[i] + 4:
        print(flags[i:i+5])

Output:

[3, 4, 5, 6, 7]

Not clear whether you'd want to include an 8 if it were there as well. If you do, then that could be done inside the successful if.

no comment
  • 6,381
  • 4
  • 12
  • 30