Implementing a Generator for Sequences of Lookahead Pairs

Question

I am trying to determine how to efficiently implement a generator over an iterable that yields all look-ahead or look-behind pairs within a defined window.

For example

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
pairs_lookahead(seq, behind=0, forward=3, empty=None)

Should produce something similar to

[((1, 2), (1, 3), (1, 4)), ((2, 3), (2, 4), (2, 5)), ...]

When an element does not exist during look-ahead or look-behind it should be filled with a defined empty value.

This is what I have so far for a lookahead generator

def lookforward(seq, behind, forward, empty=None):
    itr = iter(seq)

    lst = [empty]*behind + [next(itr)]

    # Prime the needed lookforward values:
    for x in range(forward):
        try:
            lst.append(next(itr))
        except StopIteration:
            lst.append(empty)
            forward -= 1

    # Yield the current tuple, then shift the list and generate a new item:
    for item in itr:
        yield tuple(lst)
        lst = lst[1:] + [item]

    # Yield the last full tuple, then continue with None for each lookforward
    # position:
    for x in range(forward + 1):
        yield tuple(lst)
        lst = lst[1:] + [empty]

print(list(lookforward(range(10), 0, 3)))

Executing the above implementation gives:

> [(0, 1, 2, 3), (1, 2, 3, 4), (2, 3, 4, 5), (3, 4, 5, 6), (4, 5, 6, 7), (5, 6, 7, 8), (6, 7, 8,9), (7, 8, 9, None), (8, 9, None, None), (9, None, None, None)]

I am not sure how I should proceed from here. The above implementation generates look-ahead and look-behind sequences but I am not sure how to approach modifying it to produce the sequences of pairs. I also have concerns that my implementation may not be efficient. I am rather inexperience with the implementation of iterators in Python. Any assistance would be much appreciated.

Let's forget for some time about effiency. What is the output of your current implementation? I did not run it myself but I see that it generates tuples - so why it's different from "modifying it to produce the sequences of pairs"? Please explain — Alex Yu, Feb 24 '19 at 00:19
I updated my question to have an example of output. What I want to do is modify this implementation to output sequences of lookahead pairs within a window rather than just lookahead tuples within a window. Does that clarify things? — martinsarif, Feb 24 '19 at 00:26
I seriously doubt that's your test code. For one thing the function is named `lookahead` not `lookforward`, and another is that when that is fixed, ``lookahead`, as written, causes a `RuntimeError: generator raised StopIteration`. — martineau, Feb 24 '19 at 01:11
@martineau I just fixed some typos with the above implementation. Running the code on my system through the shell with Python 3.6.7 gives the correct output with no errors. — martinsarif, Feb 24 '19 at 02:02
Well, that's an improvement, but when I run it with Python 3.7.2 (not that I think the difference matters), I still get a `RuntimeError: generator raised StopIteration`. Are you sure the code you're running has that `raise StopIteration` at end? Which is the line it occurs on for me (and isn't needed). — martineau, Feb 24 '19 at 02:08
What should the output be when both `behind ` and `forward` are specified? Something like `[..., ((2, None), (2, 1), (2, 3), (2, 4), ...]` if they are both equal to 2? — cglacet, Feb 24 '19 at 02:27
@martineau I can confirm that my code did have the line raising the StopIteration; however, removing it does not seem to affect its behavior. I will remove the exception from the original question. — martinsarif, Feb 24 '19 at 02:39
@cglacet Your example seems correct for what I had in mind. I will append an example of such a case to the original problem for others to avoid confusion. — martinsarif, Feb 24 '19 at 02:46

score 2 · Answer 1 · 2019-02-24T04:51:31.313

You could try something like the following code I wrote:

Explanation:

If behind/forward is 0 we set variable behind_counter/forward_counter to 1 in order for the following loops to atleast loop once.

Outer loop loops over the range of seq, the two inner loops over range of behind_counter(counted down) and forward_counter(counted up),respectively. Inside the most inner loop we set the respective lookahead/-behind indices and then check with help of the three if statements whether indices are out of bound in order to set the respective values to the out of bound value ('null') if necessary. The fourth if statement is chosen if the indices are not out of bound. Inside each if statement there are if-elif-elif-else statements which change the appended tuple depending on the lookahead/-behind values stored in behind and forward. If both are 0 only append seq[i], if behind is 0 only append a tuple consisting of seq[i] and current lookahead value, and so on.

After the work is done we print the value of res in order to visualize the result.

Source Code:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
behind = 0
forward = 3

res = []

behind_counter = behind
forward_counter = forward

if behind == 0:
    behind_counter = 1
if forward == 0:
    forward_counter = 1

for i in range(len(seq)):
    for j in range(behind_counter,0,-1):
        for k in range(forward_counter):
            index_behind = i - j
            index_forward = i + k + 1
            if index_behind < 0 and index_forward > len(seq):
                index_behind = 'null'
                index_forward = 'null'
                if behind == 0 and forward == 0:
                    res.append(tuple((seq[i])))
                elif behind == 0:
                    res.append(tuple((seq[i],index_forward)))
                elif forward == 0:
                    res.append(tuple((index_behind,seq[i])))
                else:
                    res.append(tuple((index_behind,seq[i],index_forward)))
                continue
            if index_behind < 0:
                index_behind = 'null'
                if behind == 0 and forward == 0:
                    res.append(tuple((seq[i])))
                elif behind == 0:
                    res.append(tuple((seq[i],seq[index_forward])))
                elif forward == 0:
                    res.append(tuple((index_behind,seq[i])))
                else:
                    res.append(tuple((index_behind,seq[i],seq[index_forward])))
                continue
            if index_forward >= len(seq):
                index_forward = 'null'
                if behind == 0 and forward == 0:
                    res.append(tuple((seq[i])))
                elif behind == 0:
                    res.append(tuple((seq[i],index_forward)))
                elif forward == 0:
                    res.append(tuple((seq[index_behind],seq[i])))
                else:
                    res.append(tuple((seq[index_behind],seq[i],index_forward)))
                continue
            if index_forward < len(seq) and index_behind >= 0:
                if behind == 0 and forward == 0:
                    res.append(tuple((seq[i])))
                elif behind == 0:
                    res.append(tuple((seq[i],seq[index_forward])))
                elif forward == 0:
                    res.append(tuple((seq[index_behind],seq[i])))
                else:
                    res.append(tuple((seq[index_behind],seq[i],seq[index_forward])))
print (res)

Output:

[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (3, 6), (4, 5), (4, 6), (4, 7), (5, 6), (5, 7), (5, 8), (6, 7), (6, 8), (6, 9), (7, 8), (7, 9), (7, 10), (8, 9), (8, 10), (8, 'null'), (9, 10), (9, 'null'), (9, 'null'), (10, 'null'), (10, 'null'), (10, 'null')]

First Major Update:

According to a new comment you want a different ouput when both lookbehind/-forward are specified, so I altered my program to now hopefully fulfill your needs:

Additional explanation for updated program:

In each iteration of the outer loop first the lookbehind pairs are added in a loop, then the lookforward pairs are added, also in a loop. As before we check for out of bounds and set the value of the lookbehind/forward value accordingly.

Updated source code:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
behind = 2
forward = 3

res = []

behind_counter = behind
forward_counter = forward

if behind == 0:
    behind_counter = 1
if forward == 0:
    forward_counter = 1

for i in range(len(seq)):
    for j in range(behind_counter,0,-1):
        index_behind = i - j
        if behind == 0:
            #res.append(tuple((seq[i])))
            continue
        else:
            if index_behind < 0:
                index_behind = 'null'
                res.append(tuple((seq[i],index_behind)))
                continue
            else:
                res.append(tuple((seq[i], seq[index_behind])))
    for k in range(forward_counter):
        index_forward = i + k + 1
        if forward == 0:
            #res.append(tuple((seq[i])))
            continue
        else:
            if index_forward >= len(seq):
                index_forward = 'null'
                res.append(tuple((seq[i],index_forward)))
                continue
            else:
                res.append(tuple((seq[i],seq[index_forward])))
print (res)

New output: [when lookforward and lookbehind are specified]

[(1, 'null'), (1, 'null'), (1, 2), (1, 3), (1, 4), (2, 'null'), (2, 1), (2, 3), (2, 4), (2, 5), (3, 1), (3, 2), (3, 4), (3, 5), (3, 6), (4, 2), (4, 3), (4, 5), (4, 6), (4, 7), (5, 3), (5, 4), (5, 6), (5, 7), (5, 8), (6, 4), (6, 5), (6, 7), (6, 8), (6, 9), (7, 5), (7, 6), (7, 8), (7, 9), (7, 10), (8, 6), (8, 7), (8, 9), (8, 10), (8, 'null'), (9, 7), (9, 8), (9, 10), (9, 'null'), (9, 'null'), (10, 8), (10, 9), (10, 'null'), (10, 'null'), (10, 'null')]

Second Major Update:

In case you want a list containing tuples of tuples as a result you could do something like this [I slightly modified the code of my first major update]:

Additional explanation:

In the beginning of each outer loop iteration we append an empty list to res. To this list we append the according values of first the lookbehind pairs, then the lookforward pairs. In the end of each outer loop iteration we then convert this newly created list to a tuple.

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
behind = 2
forward = 3

res = []

behind_counter = behind
forward_counter = forward

if behind == 0:
    behind_counter = 1
if forward == 0:
    forward_counter = 1

for i in range(len(seq)):
    res.append(list())
    for j in range(behind_counter,0,-1):
        index_behind = i - j
        if behind == 0:
            #res.append(tuple((seq[i])))
            continue
        else:
            if index_behind < 0:
                index_behind = 'null'
                res[i].append((seq[i],index_behind))
                continue
            else:
                res[i].append((seq[i], seq[index_behind]))
    for k in range(forward_counter):
        index_forward = i + k + 1
        if forward == 0:
            #res.append(tuple((seq[i])))
            continue
        else:
            if index_forward >= len(seq):
                index_forward = 'null'
                res[i].append((seq[i],index_forward))
                continue
            else:
                res[i].append((seq[i],seq[index_forward]))
    res[i] = tuple(res[i])
print (res)

Output: [list containing tuples of tuples]

[((1, 'null'), (1, 'null'), (1, 2), (1, 3), (1, 4)), ((2, 'null'), (2, 1), (2, 3), (2, 4), (2, 5)), ((3, 1), (3, 2), (3, 4), (3, 5), (3, 6)), ((4, 2), (4, 3), (4, 5), (4, 6), (4, 7)), ((5, 3), (5, 4), (5, 6), (5, 7), (5, 8)), ((6, 4), (6, 5), (6, 7), (6, 8), (6, 9)), ((7, 5), (7, 6), (7, 8), (7, 9), (7, 10)), ((8, 6), (8, 7), (8, 9), (8, 10), (8, 'null')), ((9, 7), (9, 8), (9, 10), (9, 'null'), (9, 'null')), ((10, 8), (10, 9), (10, 'null'), (10, 'null'), (10, 'null'))]

cglacet · Accepted Answer · 2019-02-24T04:02:54.110

Explanations

I feel like the best solution is to use as much available tools as possible. In particular, something that is very interesting in this case is to use zip (and its alterego zip_longest):

from itertools import zip_longest

seq = [1, 2, 3, 4]
print(list(zip(seq, seq[1:])))
print(list(zip_longest(seq, seq[1:])))

Which produces:

[(1, 2), (2, 3), (3, 4)]
[(1, 2), (2, 3), (3, 4), (4, None)]

Also note that zip can be used to "unzip":

print(list(zip(*[(1, 2), (2, 3), (3, 4)])))

Outputs:

[(1, 2, 3), (2, 3, 4)]

The first step is to understand this piece of code which builds the case forward=2:

from itertools import zip_longest

seq = [1, 2, 3, 4]
one_step_ahead = zip_longest(seq, seq[1:])
two_steps_ahead = zip_longest(seq, seq[2:])
# print(list(one_step_ahead))  # => [(1, 2), (2, 3), (3, 4), (4, None)]
# print(list(two_steps_ahead)) # => [(1, 3), (2, 4), (3, None), (4, None)]
merged = zip(one_step_ahead, two_steps_ahead)
print(list(merged))

This prints:

[((1, 2), (1, 3)), ((2, 3), (2, 4)), ((3, 4), (3, None)), ((4, None), (4, None))]

That's very close from being modulable, the only thing we presume here is that we only have two zip objects to merge, where in the real case we will have an unknown number, therefore we need to be able to translate merged = zip(one_step_ahead, two_steps_ahead) into a case where the list has an unknown size. To do so we will simply add all the "x_steps_ahead" in a list, lets call it pairs, then we will merge all these pairs using the spread operation *pairs. In the end it will look like this:

from itertools import zip_longest

seq = [1, 2, 3, 4]

pairs = []
for x in range(2):
  x_step_ahead = zip_longest(seq, seq[x:])
  pairs.append(x_step_ahead)

merged = zip(*pairs)
print(list(merged))

Which produces the same result as before:

[((1, 2), (1, 3)), ((2, 3), (2, 4)), ((3, 4), (3, None)), ((4, None), (4, None))]

That's basically the whole idea of the code I'm proposing. The case of looking backward is a bit more unusual but I'll let you understand how it works as an exercise. A slight difference in the final code is also that I try to avoid instantiating lists as much as I can. Iterators/generators are prefered, which makes the code a bit harder to read, but way more efficient in terms of memory usage.

Basically, things like the pair construction will turn to:

def pairs_generator():
  for x in range(2):
    yield zip_longest(seq, seq[x:])

pairs = pairs_generator()

That does the exact same thing as the previous code, it just avoid having a list of size x in memory to remember all the zips we create.

For the same reason, in the following code I also use itertools.islice instead of classical slicing because its a lighter version (unlike slice it doesn't instantiate a copy of the input list).

Solution implementation

from itertools import zip_longest, islice

def fill_with(iterator, value, times):
  """Add `value` `times` times in front of the iterator."""
  for _ in range(times):
    yield value
  yield from iterator

def pairs(seq, distance, reverse=False, empty=None):
  """Build lookup pairs from a list, for example: 
    list(pairs([1,2,3], 1)) => [(1, 2), (2, 3), (3, None)]
  and reverse make backward lookups: 
    list(pairs([1,2,3], 1, reverse=True)) => [(1, None), (2, 1), (3, 2)]
  """
  if reverse:
    return zip(seq, fill_with(seq, empty, distance))
  else:
    return zip_longest(seq, islice(seq, distance, None), fillvalue=empty)

def look_backward(seq, distance, empty=None):
  """Build look backward tuples, for example calling 
  list(look_backward([1,2,3], 2)) will produce: 
    [((1, None), (1, None)), ((2, None), (2, 1)), ((3, 2), (3, 1))]
  """
  return zip(*(pairs(seq, i, empty=empty, reverse=True) for i in range(distance,0, -1)))

def look_forward(seq, distance, empty=None):
  """Build look forward tuples, for example calling 
  list(look_forward([1,2,3], 2)) will produce: 
    [((1, 2), (1, 3)), ((2, 3), (2, None)), ((3, None), (3, None))]
  """
  return zip(*(pairs(seq, i+1, empty=empty) for i in range(distance)))

def pairs_lookahead(seq, behind=0, forward=3, empty=None):
  """Produce the results expected by https://stackoverflow.com/q/54847423/1720199"""
  backward_result = look_backward(seq, behind, empty=empty)
  forward_result = look_forward(seq, forward, empty=empty)
  if behind < 1 and forward > 0:
    return forward_result
  if behind > 0 and forward < 1:
    return backward_result
  return [a+b for a, b in zip(backward_result, forward_result)]

You can call it as you suggested:

seq = [1, 2, 3, 4]
result = pairs_lookahead(seq, behind=2, forward=1, empty="Z")
print(list(result))

result = pairs_lookahead(seq, behind=2, forward=0, empty="Y")
print(list(result))

result = pairs_lookahead(seq, behind=0, forward=1, empty="X")
print(list(result))

This outputs:

[((1, 'Z'), (1, 'Z'), (1, 2)), ((2, 'Z'), (2, 1), (2, 3)), ((3, 1), (3, 2), (3, 4)), ((4, 2), (4, 3), (4, 'Z'))]
[((1, 'Y'), (1, 'Y')), ((2, 'Y'), (2, 1)), ((3, 1), (3, 2)), ((4, 2), (4, 3))]
[((1, 2),), ((2, 3),), ((3, 4),), ((4, 'X'),)]

This is exactly the sort of solution I was looking for. I also appreciate the amount of depth that you have gone into explaining your reasoning. However, I have a question for how I might use such an implementation for sequences of objects: how might I perform processing on individual non-empty objects to produce some kind of view for the objects in the sequence? I apologize if this question add far more complexity to the problem. Your current solution is more than sufficient for the original problem. — martinsarif, Feb 25 '19 at 02:44
I'm not sure to fully understand what you mean here. You want `seq` to be a sequence of any object instead of a sequence of numbers? Also I don't understand what you mean by "view for the objects". — cglacet, Feb 25 '19 at 08:12
I apologize for not explaining what I had in mind very clearly in my previous comment. What I was wondering was how might such an implementation differ if it were desirable to perform computations on elements in the original sequence that would appear in the pairs - regardless of whether they are objects or not. A potential use case is where one wants to construct pairs of meaningful data computed from objects rather than the objects themselves. Does this make more sense? I suspect that imap or map may be helpful for this, but I am unfamiliar with their use. — martinsarif, Feb 25 '19 at 15:43

Blckknght · Answer 3 · 2019-02-24T06:41:44.383

A useful intermediate step in solving your problem is to produce sequences of adjacent values. So if the input was [1, 2, 3, 4, 5, ...], you'd iterate and get (1, 2, 3, 4) then (2, 3, 4, 5) and so on.

There's a nifty way to do this using itertools.tee:

import itertools

def n_wise(iterable, n):
    iterators = itertools.tee(iterable, n)
    for i, iterator in enumerate(iterators):
        next(itertools.islice(iterator, i, i), None)  # discard i values from the iterator
    return zip(*iterators)

Now we can fairly easily do lookahead and lookbehinds (I'm ignoring the empty values for now):

def lookaround(iterable, behind, ahead):
    for values in n_wise(iterable, 1 + behind + ahead):
        behind_values = values[:behind]
        current_value = values[behind]
        ahead_values = values[behind+1:]
        for b in behind_values:
            yield current_value, b
        for a in ahead_values:
            yield current_value, a

The easiest way to adapt this to support the empty values would just be to pad the iterable. You need behind extra empty values at the start, and ahead extra values at the end.

def lookaround_with_empties(iterable, behind, ahead, empty=None):
    padded_iterable = itertools.chain([empty]*behind, iterable, [empty]*ahead)
    return lookaround(padded_iterable, behind, ahead)

Now, this behaves a little odd when it looks back or ahead into several empty values in a row (since it repeats the same output for each missing value), but I'm not sure what you expect in those situations. There's probably a simple way to filter the output to avoid the duplicates, if you want that.

Adding empty at the end is a good idea, it avoids having it all around and also avoid constructing useless stuff :p. By the way you are passing the empty param to `lookaround` and you also have `enumerate(it)` instead of `enumerate(iterators)`. Lookup pairs are not grouped by tuple in the output as it was requested. — cglacet, Feb 24 '19 at 04:15

score 0 · Answer 4 · answered Feb 24 '19 at 14:34

As a first step write a function that from a pivot index returns the list of look-ahead and look-behind pairs centered in the pivot.

Using list comprehensions:

def around_pivot(seq, pivot, behind=2, forward=3):
    return [[seq[pivot], seq[pivot+i] if pivot+i < len(seq) and pivot+i >= 0 else None]
      for i in range(-behind, forward+1) if i != 0]

If now all pairs are desired it a a matter of applying again list comprehensions, varying the pivot:

all_pairs = [around_pivot(seq, i) for i in range(len(seq))]

Or if you prefer a one line solution, but please consider code readability/maintenability:

def all_pairs(seq, behind=2, forward=3):
    return [[[seq[p], seq[p+i] if p+i < len(seq) and p+i >= 0 else None]
      for i in range(-behind, forward+1) if i != 0] for p in range(len(seq))]

If you want optimize memory usage a generator comprehensions may be used instead:

def all_pairs(seq, behind=2, forward=3):
    return (((seq[p], seq[p+i] if p+i < len(seq) and p+i >= 0 else None)
      for i in range(-behind, forward+1) if i != 0) for p in range(len(seq)))

Implementing a Generator for Sequences of Lookahead Pairs

4 Answers4

First Major Update:

Second Major Update:

Explanations

Solution implementation