Get all subsequences of a numpy array

Question

Given an a numpy array of size n and an integer m I want to generate all sequential m length subsequences of the array, preferably as a two dimensional array.

Example:

>>> subsequences(arange(10), 4)

array([[0, 1, 2, 3, 4, 5, 6],
       [1, 2, 3, 4, 5, 6, 7],
       [2, 3, 4, 5, 6, 7, 8],
       [3, 4, 5, 6, 7, 8, 9]])

the best way I can come up with to do this is

def subsequences(arr, m):
    n = arr.size
    # Create array of indices, essentially solution for "arange" input
    indices = cumsum(vstack((arange(n - m + 1), ones((m-1, n - m + 1), int))), 0)
    return arr[indices]

Is there a better, preferably built in, function that I'm missing?

In the question you state you want `m` length subsequences, but in the example `m` is the number of subsequences, not their length. — logc, Apr 09 '14 at 18:17
@logc I want the columns to be the the m length subsequences i.e. look at the transpose — Erik, Apr 09 '14 at 18:22
possible duplicate of [Efficient Numpy 2D array construction from 1D array](http://stackoverflow.com/questions/4923617/efficient-numpy-2d-array-construction-from-1d-array) — , Apr 11 '14 at 19:15

score 5 · Answer 1 · answered Apr 09 '14 at 19:29

5

scipy.linalg.hankel does this.

from scipy.linalg import hankel
def subsequences(v, m):
    return hankel(v[:m], v[m-1:])

answered Apr 09 '14 at 19:29

chthonicdaemon

19,180
2
52
66

score 5 · Accepted Answer · edited May 23 '17 at 11:44

Here's a very fast and memory efficient method, that's just a "view" into the original array:

from numpy.lib.stride_tricks import as_strided

def subsequences(arr, m):
    n = arr.size - m + 1
    s = arr.itemsize
    return as_strided(arr, shape=(m,n), strides=(s,s))

You should make a np.copy first if you need to write to this array, otherwise you would modify the original array and the corresponding entries in the "subsequences" array as well.

More info here: https://stackoverflow.com/a/4924433/2379410

score 4 · Answer 3 · answered Apr 09 '14 at 19:11

You were on the right track.

You can take advantage of the following broadcasting trick, to create a 2dim indices array from two 1dim aranges:

arr = arange(7)[::-1]
arr
=> array([6, 5, 4, 3, 2, 1, 0])
n = arr.size
m = 3

indices = arange(m) + arange(n-m+1).reshape(-1, 1)  # broadcasting rulez
indices
=>
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])

arr[indices]
=>
array([[6, 5, 4],
       [5, 4, 3],
       [4, 3, 2],
       [3, 2, 1],
       [2, 1, 0]])

This is definitely better than the way I was doing it, but using a built in scipy tool seems like the best option — Erik, Apr 09 '14 at 19:45

Ruggero Turra · Answer 4 · 2014-04-09T18:52:38.673

Iterator based

from itertools import tee, islice
import collections
import numpy as np

# adapted from https://docs.python.org/2/library/itertools.html
def consumed(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)
    return iterator


def subsequences(iterable, b):
    return np.array([list(consumed(it, i))[:b] for i, it in enumerate(tee(iterable, len(iterable) - b + 1))]).T

print subsequences(np.arange(10), 4)

Slice based

import numpy as np

def subsequences(iterable, b):
    return np.array([iterable[i:i + b] for i in range(len(iterable) - b + 1)]).T

print subsequences(np.arange(10), 4)

I tried the slice based method, and it seemed to significantly underperform the indexing method. In general, I think transition to and from numpy arrays is more expensive than operations that only operate on numpy data types. Thanks for the suggestion though! — Erik, Apr 09 '14 at 19:52

Get all subsequences of a numpy array

4 Answers4

Iterator based

Slice based

Linked