1

I have an array in numpy, as this:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

I want to reshape it like that:

[[1,2,3],
[2,3,4],
[3,4,5],
[4,5,6],
[6,7,8],
[7,8,9],
[8,9,10]]

what is the most efficient way to do that?

What I'm currently doing is a loop using np.append, but this takes very long.

Thanks

user3320142
  • 85
  • 1
  • 6

4 Answers4

2

You can use numpy stride tricks (numpy.lib.stride_tricks.as_strided) to create a new view of the array. This will be faster than any list comprehension because no data are copied. The IPython Cookbook has more examples of using stride tricks.

import numpy as np

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
bytes_per_item = a.dtype.itemsize
b = np.lib.stride_tricks.as_strided(
    a, shape=(8, 3), strides=(bytes_per_item, bytes_per_item))
array([[ 1,  2,  3],
       [ 2,  3,  4],
       [ 3,  4,  5],
       [ 4,  5,  6],
       [ 5,  6,  7],
       [ 6,  7,  8],
       [ 7,  8,  9],
       [ 8,  9, 10]])

Timed tests

This answer is orders of magnitude faster than answers here that use loops. Find the tests below (done in Jupyter Notebook with %timeit magic). Note that one of the functions does not work properly with numpy arrays and requires a Python list.

Setup

import numpy as np

a = np.arange(1, 100001, dtype=np.int64)
a_list = a.tolist()

def jakub(a, shape):
    a = np.asarray(a)
    bytes_per_item = a.dtype.itemsize
    # The docs for this function recommend setting `writeable=False` to
    # prevent modifying the underlying array.
    return np.lib.stride_tricks.as_strided(
        a, shape=shape, strides=(bytes_per_item, bytes_per_item), writeable=False)

# https://stackoverflow.com/a/63426256/5666087
def daveldito(arr):
    return np.array([arr[each:each+2]+[arr[each+2]] for each in range(len(arr)-2)])

# https://stackoverflow.com/a/63426205/5666087
def akshay_sehgal(a):
    return np.array([i for i in zip(a,a[1:],a[2:])])

Results

%timeit jakub(a, shape=(a.shape[0]-2, 3))
8.85 µs ± 425 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit daveldito(a_list)
141 ms ± 8.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit akshay_sehgal(a)
168 ms ± 9.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
jkr
  • 17,119
  • 2
  • 42
  • 68
1

One way you can do a 3 gram iteration on the list is by using a zip

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
np.array([i for i in zip(a,a[1:],a[2:])])
array([[ 1,  2,  3],
       [ 2,  3,  4],
       [ 3,  4,  5],
       [ 4,  5,  6],
       [ 5,  6,  7],
       [ 6,  7,  8],
       [ 7,  8,  9],
       [ 8,  9, 10]])

A general function to solve n-gram iterations can be made using the following -

def find_ngrams(input_list, n):
    return np.array(list(zip(*[input_list[i:] for i in range(n)])))

find_ngrams(a, 3) #try setting n to other values like 2 or 4 or 5
array([[ 1,  2,  3],
       [ 2,  3,  4],
       [ 3,  4,  5],
       [ 4,  5,  6],
       [ 5,  6,  7],
       [ 6,  7,  8],
       [ 7,  8,  9],
       [ 8,  9, 10]])
find_ngrams(a, 5)
array([[ 1,  2,  3,  4,  5],
       [ 2,  3,  4,  5,  6],
       [ 3,  4,  5,  6,  7],
       [ 4,  5,  6,  7,  8],
       [ 5,  6,  7,  8,  9],
       [ 6,  7,  8,  9, 10]])
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
1

I'd do it as follows (note that I rely on basic list comprehension alone):

arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
np.array([arr[each:each+2]+[arr[each+2]] for each in range(len(arr)-2)])

Output:

array([[ 1,  2,  3],
       [ 2,  3,  4],
       [ 3,  4,  5],
       [ 4,  5,  6],
       [ 5,  6,  7],
       [ 6,  7,  8],
       [ 7,  8,  9],
       [ 8,  9, 10]])

As for performance, for 100_000 elements in arr, my late 2016 MacBook Pro gives the following time stats:

CPU times: user 148 ms, sys: 26.1 ms, total: 174 ms Wall time: 186 ms

DaveIdito
  • 1,546
  • 14
  • 31
1

Another solution with np.lib.stride_tricks.as_strided:

import numpy as np
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

newshape = x.shape[:-1] + (x.shape[-1] - 3 + 1, 3)
a = np.lib.stride_tricks.as_strided(x, shape=newshape, strides=x.strides + (x.strides[-1],))

returns

array([[ 1,  2,  3],
       [ 2,  3,  4],
       [ 3,  4,  5],
       [ 4,  5,  6],
       [ 5,  6,  7],
       [ 6,  7,  8],
       [ 7,  8,  9],
       [ 8,  9, 10]])

You can read more about that topic for example here.

Péter Leéh
  • 2,069
  • 2
  • 10
  • 23
  • 1
    You can use `strides=x.strides + x.strides[-1:]` so you don't need as many parentheses. – jkr Aug 15 '20 at 14:22