0

I have an array of size nxm and want to take the first 10 rows and perform calculations, then take the next 10 rows perform calculations, etc. But this is hard coded, how can I make a loop?

Code Attempted:

import numpy as np
total = []
x= np.random.random((100,4))
a = np.average(x[:10])
total.append(a)
a = np.average(x[10:20])
total.append(a)
a = np.average(x[20:30])
....

Goal:

for *this array*:
# to something
# append value
# go back and get next 10 values
maximus
  • 335
  • 2
  • 16

2 Answers2

2

It looks like you want the following.

import numpy as np
x = np.random.random((100,4))
L = 10
k = 100//L
total = [np.average(x[L*i:L*(i+1)]) for i in range(k)]

If you'd rather implement this using a loop rather than list comprehension,

import numpy as np
x = np.random.random((100,4))
L = 10
k = 100//L
total = []
for i in range(k):
    total.append(np.average(x[L*i:L*(i+1)]))

As an alternative, here's an approach using a 3-dimensional reshape.

x= np.random.random((100,4))

L = 10 #window length
n = x.shape[1] #number of columns
total = a.reshape(-1,10,n).mean(axis = (1,2))
Ben Grossmann
  • 4,387
  • 1
  • 12
  • 16
2
import numpy as np
x = np.random.random((100,4))
a = 10
b = 100//a
c = 4

You want the array of average numbers of the first 10 * 4 part, the second 10 * 4 part,..., right?

reshape function can be really useful here.

x_splited = x.reshape((-1, a*c))
total = x_splited.mean(axis=1)

This is the answer you need. The reshape function let the first a*c elements in the original matrix become the first row of the new matrix. Then, mean(axis=1) help you get the average of the first row.

Also, you could try something like this:

 x_splited = x.reshape((-1, a, c))

You can do something more complicated than this question with it.

Just a tip: in python, it is prefered to avoid using loop because it is slow.

Second tip: if you are still not proficient in using loop in Python, you are encouraged to spend some time to practice it.

  • As far as I know, there's nothing wrong with looping over numpy objects. Perhaps what you have in mind is the fact that [looping over Pandas data frames](https://stackoverflow.com/a/55557758/2476977) is slower than vectorized methods. – Ben Grossmann Feb 10 '22 at 22:08
  • 1
    @BenGrossmann That's true. That's what my impression comes from. I just believe that np.mean(np.array) would be faster than for-loop np.mean multiple times. I tested it in Jupyter with %%timeit. The speed difference is about 10 times. 5.45 µs vs 51.6 µs – Hongleng Fu Feb 10 '22 at 22:16
  • That's good to know, thanks for the information – Ben Grossmann Feb 10 '22 at 23:15
  • @BenGrossmann It looks that `numpy` uses looping under the hood. It happens at the C level so you can't control it with Python. That's why there are no ways to break methods such as `np.mean` somewhere in the middle of looping like in Python. You are forced to use methods that iterates all the array. That's why `numpy` is accessible for only a part of algorithms - rather those that jumps between dimensions than the ones of graph theory – mathfux Feb 10 '22 at 23:31