1

The following is an example code which compute array B from A:

import numpy as np
idx1 = np.array([
 [3, 0, 0],
 [2, 1, 0],
 [2, 0, 1],
 [1, 2, 0],
 [1, 1, 1],
 [1, 0, 2],
 [0, 3, 0],
 [0, 2, 1],
 [0, 1, 2],
 [0, 0, 3]])
idx2 = np.arange(3)
A = np.arange(10*4*3).reshape(10, 4, 3)
B = np.prod(A[:, idx1, idx2], axis=2)

Notice the line

B = np.prod(A[:, idx1, idx2], axis=2)

Is this line memory efficent? Or does numpy will generate some internal array for A[:, idx1, idx2]?

One can image that if len(A) is very large, and numpy generate some internal array for A[:, idx1, idx2], it is not memory efficient. Does there exist any better way to do such thing?

Huayi Wei
  • 829
  • 1
  • 7
  • 16
  • Are you actually running out of memory? If so, how much memory do you have, and what shapes are your actual data? – John Zwinck Nov 26 '17 at 02:48
  • @JohnZwinck In fact, I am not running out of memory. I'm just curious if there is a better way to do this. – Huayi Wei Nov 26 '17 at 08:09

1 Answers1

2

This expression is parsed and evaluated by the Python interpreter:

B = np.prod(A[:, idx1, idx2], axis=2)

first it does

temp = A[:, idx1, idx2]   # expands to:
temp = A.__getitem__(slice(None), idx1, idx2)

Since idx1, idx2 are arrays, this is advanced indexing, and temp is a copy, not a view.

Next the interpret executes:

np.prod(temp, axis=2)

that is, it passes temporary array to the prod function, which then returns an array, which is assigned to the B variable.

I don't know how much buffering prod does. I can imagine it setting up a nditer (c-api version) that takes two operand arrays, the temp and an output of the right shape (temp.shape(:-1) assuming the sum is on the last dimension of temp). See the reduction section of the docs that I cited in The `out` arguments in `numpy.einsum` can not work as expected.

In sum, Python, when evaluating a function, first evaluates all the arguments, and then passes them to the function. Evaluation of lists can be delayed by using generators, but there isn't an equivalent for numpy arrays.

hpaulj
  • 221,503
  • 14
  • 230
  • 353