Numpy, broadcasting or for loop with a funtion

Question

I have a NumPy array of shape (550000, 10, 5) and a NormalizeData() function that takes in an array of shape (10, 5) and returns a new array with the same dimensions. I need to transform each of the 550000 arrays. I want to try this as quickly as possible because I will be using much larger arrays in the future.

Is it possible to use broadcasting, or should I employ a for loop?

def NormalizeData(ndata):
    ndata[:, -1] = (
        (ndata[:, -1] - np.min(ndata[:, -1]))
        / (np.max(ndata[:, -1]) - np.min(ndata[:, -1]))
        * 255)
    ndata[:, :-1] = (
        (ndata[:, :-1] - np.min(ndata[:, :-1]))
        / (np.max(ndata[:, :-1]) - np.min(ndata[:, :-1]))
        * 255)
    ndata = ndata.round()
    return ndata.astype("uint8")

newArray = []
for i in data:
    NormalizeData(i)
    np.stack(newArray, i)

Please see the linked duplicate. It might be necessary to do the `.astype` step *after* applying a function along the `550000`-long axis. — Karl Knechtel, Jun 14 '22 at 23:05
@KarlKnechtel, that's not a good `duplicate`. The answers either just iterate or use `apply_along_axis` (which is slower than iteration, and passes a 1d array to the function). — hpaulj, Jun 15 '22 at 03:14
Can you give a reason to suppose that neither iteration (like OP already has) or `apply_along_axis` (which is named exactly according to the described task) is the correct tool for the job? — Karl Knechtel, Jun 15 '22 at 03:17
Matthew, when iterating don't use `np.stack` in the loop; stick with list append, e.g. `alist=[]` and `alist.append(NormalizeData(i)`. But you shouldn't have to iterate. It looks to me (without testing) that `NormalizeData` can work with the whole `data` if you just change all `ndata[:, :-1]` expressions to `ndata[..., :-1]`. `:` slices one dimension, `...` slices multiple, in your case two. — hpaulj, Jun 15 '22 at 03:23
Or call `NormalizeData(data.reshape(-1,5)).reshape(data.shape)`. To `NormalizeData` it doesn't matter then the array is (10,5) or a (5500000,5) — hpaulj, Jun 15 '22 at 03:26
@hpaulj Thank you so much. The `...` worked better than I could have hoped. I did try apply_along_axis, and it didn't work over more than one axis. Also, apply_over_axes didn't work any better than a for loop. What is the principle behind the `...`? — matthew altenburg, Jun 21 '22 at 09:16

Numpy, broadcasting or for loop with a funtion

0 Answers0