0

I am having trouble performing a column-wise operation for each column of a dim-2 numpy array. I am trying to adapt my case to this answer, though my setup is different. My actual dataset is quite large and involves multiple resamplings, hence the syntax of the example below. If the code and explanation looks too long, consider skipping ahead to the header Relevant.

Skippable (Only here to reproduce zs below)

Consider an (x_n, y_n) dataset where n = 0, 1, or 2.

def get_xy(num, size=10):
    ## (x1, y1), (x2, y2), (x3, y3) where xi, yi are both arrays
    if num == 0:
        x = np.linspace(7, size+6, size)
        y = np.linspace(3, size+2, size)
    elif num == 1:
        x = np.linspace(5, size+4, size)
        y = np.linspace(2, size+1, size)
    elif num == 2:
        x = np.linspace(4, size+3, size)
        y = np.linspace(1, size, size)
    return x, y

Suppose we can calculate some metric z_n given arrays x_n and y_n.

def get_single_z(x, y, constant=2):
    deltas = [x[i] - y[i] for i in range(len(x)) if len(x) == len(y)]
    return constant * np.array(deltas)

Instead of calculating each z_n individually, we can calculate all z_n's at once.

def get_all_z(constant=2):
    zs = []
    for num in range(3): ## 0, 1, 2
        xs, ys = get_xy(num)
        zs.append(get_single_z(xs, ys, constant))
    zs = np.array(zs)
    return zs

Relevant:

zs = get_all_z()
print(zs)
>> [[ 8.  8.  8.  8.  8.  8.  8.  8.  8.  8.]
    [ 6.  6.  6.  6.  6.  6.  6.  6.  6.  6.]
    [ 6.  6.  6.  6.  6.  6.  6.  6.  6.  6.]]

For my purpose, I'd like to make a new list or array vs for which the value at each index is equal to the average of the values in the corresponding columns of zs. For this case, every element of vs would be identical (since each operation would be the average of [8, 6, 6]). But had the first element of the first sub-array been a 10 instead of an 8, then the first element of vs would be the average of [10, 6, 6].

Unsuccessful Attempt:

def get_avg_per_col(z):
    ## column ?= axis number
    return [np.mean(z, axis=i) for i in range(len(zs[0]))]

print(get_avg_per_col(zs))
Traceback (most recent call last):...
...line 50, in _count_reduce_items ## of numpy code, not my code
    items *= arr.shape[ax]
IndexError: tuple index out of range

1 Answers1

2

You can use np.mean on the transposed zs to get the column wise mean.

In [49]: import numpy as np

In [53]: zs = np.array([[ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.],
    ...:  [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
    ...:  [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.]])

In [54]: np.mean(zs.T, axis=1)
Out[54]: 
array([ 6.66666667,  6.66666667,  6.66666667,  6.66666667,  6.66666667,
        6.66666667,  6.66666667,  6.66666667,  6.66666667,  6.66666667])
cs95
  • 379,657
  • 97
  • 704
  • 746