Sequentially take a window of rows from array (python)

Question

I have an array of size nxm and want to take the first 10 rows and perform calculations, then take the next 10 rows perform calculations, etc. But this is hard coded, how can I make a loop?

Code Attempted:

import numpy as np
total = []
x= np.random.random((100,4))
a = np.average(x[:10])
total.append(a)
a = np.average(x[10:20])
total.append(a)
a = np.average(x[20:30])
....

Goal:

for *this array*:
# to something
# append value
# go back and get next 10 values

Should that be `x` instead of `data`? – Ben Grossmann Feb 10 '22 at 21:15 — Ben Grossmann, Feb 10 '22 at 21:15

Ben Grossmann · Accepted Answer · 2022-02-11T05:17:13.910

2

It looks like you want the following.

import numpy as np
x = np.random.random((100,4))
L = 10
k = 100//L
total = [np.average(x[L*i:L*(i+1)]) for i in range(k)]

If you'd rather implement this using a loop rather than list comprehension,

import numpy as np
x = np.random.random((100,4))
L = 10
k = 100//L
total = []
for i in range(k):
    total.append(np.average(x[L*i:L*(i+1)]))

As an alternative, here's an approach using a 3-dimensional reshape.

x= np.random.random((100,4))

L = 10 #window length
n = x.shape[1] #number of columns
total = a.reshape(-1,10,n).mean(axis = (1,2))

edited Feb 11 '22 at 05:17

answered Feb 10 '22 at 21:17

Ben Grossmann

4,387
1
12
16

is this grabbing the first 10 rows then average... then rows 11-20 for average, etc? Similar to windowing? – maximus Feb 10 '22 at 21:28
Yes, that's right. – Ben Grossmann Feb 10 '22 at 21:46
Wherever possible, it is best to avoid for loops with numpy. For the question here, it's better to reshape the array and vectorise the sum, as shown by @ Hongleng Fu. – Paul Feb 10 '22 at 23:04
2

@Paul FWIW I've added an alternative approach – Ben Grossmann Feb 10 '22 at 23:29

Hongleng Fu · Answer 2 · 2022-02-10T22:26:00.337

2

import numpy as np
x = np.random.random((100,4))
a = 10
b = 100//a
c = 4

You want the array of average numbers of the first 10 * 4 part, the second 10 * 4 part,..., right?

reshape function can be really useful here.

x_splited = x.reshape((-1, a*c))
total = x_splited.mean(axis=1)

This is the answer you need. The reshape function let the first a*c elements in the original matrix become the first row of the new matrix. Then, mean(axis=1) help you get the average of the first row.

Also, you could try something like this:

 x_splited = x.reshape((-1, a, c))

You can do something more complicated than this question with it.

Just a tip: in python, it is prefered to avoid using loop because it is slow.

Second tip: if you are still not proficient in using loop in Python, you are encouraged to spend some time to practice it.

edited Feb 10 '22 at 22:26

answered Feb 10 '22 at 21:56

Hongleng Fu

29
3

As far as I know, there's nothing wrong with looping over numpy objects. Perhaps what you have in mind is the fact that [looping over Pandas data frames](https://stackoverflow.com/a/55557758/2476977) is slower than vectorized methods. – Ben Grossmann Feb 10 '22 at 22:08
1

@BenGrossmann That's true. That's what my impression comes from. I just believe that np.mean(np.array) would be faster than for-loop np.mean multiple times. I tested it in Jupyter with %%timeit. The speed difference is about 10 times. 5.45 µs vs 51.6 µs – Hongleng Fu Feb 10 '22 at 22:16
That's good to know, thanks for the information – Ben Grossmann Feb 10 '22 at 23:15
@BenGrossmann It looks that `numpy` uses looping under the hood. It happens at the C level so you can't control it with Python. That's why there are no ways to break methods such as `np.mean` somewhere in the middle of looping like in Python. You are forced to use methods that iterates all the array. That's why `numpy` is accessible for only a part of algorithms - rather those that jumps between dimensions than the ones of graph theory – mathfux Feb 10 '22 at 23:31

Sequentially take a window of rows from array (python)

2 Answers2