1

I have an array like a=[1, 2, 3, 4, 5, 6, 7]. I want to split this array into 3 chunks of any size.

When I split this into 3 chunks, I get 3 subarrays: [array([1, 2, 3]), array([4, 5]), array([6, 7])].

My goal is to get an array with the average of the elements in a subarray: [2, 4.5, 6.5], since (1+2+3)/3=2 (first element), (4+5)/2=4.5 (second element), and so on.

I tried the following code:

import numpy as np
a=[1, 2, 3, 4, 5, 6, 7]
a_split=np.array_split(a, 3)
a_split_avg=np.mean(a_split, axis=1)

I am getting the following error: tuple index out of range.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
Hrihaan
  • 275
  • 5
  • 21

4 Answers4

4

Here's a vectorized solution that avoids the splitting step to gain performance and directly gets the grouped summations and hence averages -

def average_groups(a, N): # N is number of groups and a is input array
    n = len(a)
    m = n//N
    w = np.full(N,m)
    w[:n-m*N] += 1
    sums = np.add.reduceat(a, np.r_[0,w.cumsum()[:-1]])
    return np.true_divide(sums,w)

Sample run -

In [357]: a=[1, 2, 3, 4, 5, 6, 7]

In [358]: average_groups(a, N=3)
Out[358]: array([2. , 4.5, 6.5])
Divakar
  • 218,885
  • 19
  • 262
  • 358
3

You're getting the error because np.array_split returns a python list of numpy array, not a multidimentional numpy array, so axis wouldn't work with it. Replace the last line with this:

a_split_avg = [np.mean(arr) for arr in a_split]
saga
  • 1,933
  • 2
  • 17
  • 44
3

You can use np.vectorize in your calculation to apply the mean function to each item in the list of arrays:

means = np.vectorize(np.mean)(a_split)

The result is a list with the mean for each sub-array you create.

rafaelc
  • 57,686
  • 15
  • 58
  • 82
vielkind
  • 2,840
  • 1
  • 16
  • 16
  • 2
    You might want to add this from the doc on `np.vectorize` : `The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.`. – Divakar Nov 06 '18 at 18:54
  • I appreciate it. – Hrihaan Nov 06 '18 at 19:06
1

Try this:

In [1224]: mean_arr = []
In [1225]: for i in a_split:
      ...:     mean_arr.append(np.mean(i))

In [1226]: 

In [1226]: mean_arr
Out[1226]: [2.0, 4.5, 6.5]
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58