40

I haven't grokked the key concepts in numpy yet.

I would like to create a 3-dimensional array and populate each cell with the result of a function call - i.e. the function would be called many times with different indices and return different values.

Note: Since writing this question, the documentation has been updated to be clearer.

I could create it with zeros (or empty), and then overwrite every value with a for loop, but it seems cleaner to populate it directly from the function.

fromfunction sounds perfect. Reading the documentation it sounds like the function gets called once per cell.

But when I actually try it...

from numpy import *

def sum_of_indices(x, y, z):
    # What type are X, Y and Z ? Expect int or duck-type equivalent.
    # Getting 3 individual arrays
    print "Value of X is:"
    print x

    print "Type of X is:", type(x)
    return x + y + z

a = fromfunction(sum_of_indices, (2, 2, 2))

I expect to get something like:

Value of X is:
0
Type of X is: int
Value of X is:
1
Type of X is: int

repeated 4 times.

I get:

Value of X is:
[[[ 0.  0.]
  [ 0.  0.]]

 [[ 1.  1.]
  [ 1.  1.]]]
[[[ 0.  0.]
  [ 1.  1.]]

 [[ 0.  0.]
  [ 1.  1.]]]
[[[ 0.  1.]
  [ 0.  1.]]

 [[ 0.  1.]
  [ 0.  1.]]]
Type of X is: <type 'numpy.ndarray'>

The function is only called once, and seems to return the entire array as result.

What is the correct way to populate an array based on multiple calls to a function of the indices?

Oddthinking
  • 24,359
  • 19
  • 83
  • 121
  • What is your expected result? fromfunction is being called once per cell - what do you mean by "multiple calls to a function of the indices"? – YXD Sep 09 '13 at 15:58
  • In your first block of code, `a` IS your populated array, where `a[i, j, k] = sum_of_indices(i, j, k)` – YXD Sep 09 '13 at 15:59
  • Sorry, I thought the expected result was clear from the comments. I have expanded. Yes, I know 'a' is the populated array, but (I believed) only because of array addition. When I replace sum_of_indices with a 'real' function (e.g. database lookup) that won't be possible. – Oddthinking Sep 09 '13 at 16:04

7 Answers7

56

The documentation is very misleading in that respect. It's just as you note: instead of performing f(0,0), f(0,1), f(1,0), f(1,1), numpy performs

f([[0., 0.], [0., 1.]], [[1., 0.], [1., 1.]])

Using ndarrays rather than the promised integer coordinates is quite frustrating when you try and use something likelambda i: l[i], where l is another array or list (though really, there are probably better ways to do this in numpy).

The numpy vectorize function fixes this. Where you have

m = fromfunction(f, shape)

Try using

g = vectorize(f)
m = fromfunction(g, shape)
pfabri
  • 885
  • 1
  • 9
  • 25
Chris Jones
  • 716
  • 6
  • 11
  • Your 'vectorize' fix seems to mostly work, but I think it calls f(0, 0) twice for some reason. why could that be? – starwarswii Nov 14 '18 at 21:47
  • 2
    The [NumPy documentation for `vectorize`](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.vectorize.html#numpy.vectorize) says: _The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the `otypes` argument._ – Ruben9922 Jan 13 '19 at 13:33
19

I obviously didn't made myself clear. I am getting responses that fromfunc actually works as my test code demonstrates, which I already knew because my test code demonstrated it.

The answer I was looking for seems to be in two parts:


The fromfunc documentation is misleading. It works to populate the entire array at once.

Note: Since writing this question, the documentation has been updated to be clearer.

In particular, this line in the documentation was incorrect (or at the very minimum, misleading)

For example, if shape were (2, 2), then the parameters in turn be (0, 0), (0, 1), (1, 0), (1, 1).

No. If shape (i.e. from context, the second parameter to the fromfunction) were (2,2), the parameters would be (not 'in turn', but in the only call):

(array([[ 0.,  0.], [ 1.,  1.]]), array([[ 0.,  1.], [ 0.,  1.]]))

The documentation has been updated, and currently reads more accurately:

The function is called with N parameters, where N is the rank of shape. Each parameter represents the coordinates of the array varying along a specific axis. For example, if shape were (2, 2), then the parameters would be array([[0, 0], [1, 1]]) and array([[0, 1], [0, 1]])

(My simple example, derived from the examples in the manual, may have been misleading, because + can operate on arrays as well as indices. This ambiguity is another reason why the documentation is unclear. I want to ultimately use a function that isn't array based, but is cell-based - e.g. each value might be fetched from a URL or database based on the indices, or even input from the user.)


Returning to the problem - which is how can I populate an array from a function that is called once per element, the answer appears to be:

You cannot do this in a functional style.

You can do it in an imperative/iterative style - i.e. writing nested for-loops, and managing the index lengths yourself.

You could also do it as an iterator, but the iterator still needs to track its own indices.

Oddthinking
  • 24,359
  • 19
  • 83
  • 121
  • 10
    This is an incredibly misleading piece of documentation. Add that to the fact that `x == y` has the completely ridiculous behavior of doing pointwise comparison and returning an array, and you have documentation examples that actually seem to imply, to someone with experience outside of numpy, that it is doing cell-by-cell calculation – k_g Jun 01 '16 at 15:48
6

I think you are misunderstanding what fromfunction does.

From numpy source code.

def fromfunction(function, shape, **kwargs):
    dtype = kwargs.pop('dtype', float)
    args = indices(shape, dtype=dtype)
    return function(*args,**kwargs)

Where indices is fairly equivalent to meshgrid where each variable is np.arange(x).

>>> side = np.arange(2)
>>> side
array([0, 1])
>>> x,y,z = np.meshgrid(side,side,side)
>>> x
array([[[0, 0],
        [1, 1]],

       [[0, 0],
        [1, 1]]])
>>> x+y+z #Result of your code.
array([[[0, 1],
        [1, 2]],

       [[1, 2],
        [2, 3]]])
Daniel
  • 19,179
  • 7
  • 60
  • 74
  • Understood (now), but see my answer. The documentation doesn't match this code, and this doesn't address the question of what the correct way to do this is. – Oddthinking Sep 10 '13 at 01:32
1

Does this give you an incorrect result? a should be as expected (and is when I tested it) and seems like a fine way to do what you want.

>>> a
array([[[ 0.,  1.],    # 0+0+0, 0+0+1
        [ 1.,  2.]],   # 0+1+0, 0+1+1

       [[ 1.,  2.],    # 1+0+0, 1+0+1
        [ 2.,  3.]]])  # 1+1+0, 1+1+1

Since fromfunction works on array indices for input, you can see that it only needs to be called once. The documentation does not make this clear, but you can see that the function is being called on arrays of indices in the source code (from numeric.py):

def fromfunction(function, shape, **kwargs):
    . . .
    args = indices(shape, dtype=dtype)
    return function(*args,**kwargs)

sum_of_indices is called on array inputs where each array holds the index values for that dimension.

array([[[ 0.,  0.],
        [ 1.,  1.]],

       [[ 1.,  1.],
        [ 1.,  1.]]])

+

array([[[ 0.,  0.],
        [ 1.,  1.]],

       [[ 0.,  0.],
        [ 1.,  1.]]])

+
array([[[ 0.,  1.],
        [ 0.,  1.]],

       [[ 0.,  1.],
        [ 0.,  1.]]])

=

array([[[ 1.,  1.],
        [ 1.,  2.]],

       [[ 1.,  2.],
        [ 2.,  3.]]])
A.E. Drew
  • 2,097
  • 1
  • 16
  • 24
  • Sorry - my example led you down the wrong path. Yes, the example worked as described, but I want to replace sum_of_indices with a real function that can't work at the array level. See my answer. – Oddthinking Sep 10 '13 at 01:33
1

Here's my take on your problem:

As mentioned by Chris Jones the core of the solution is to use np.vectorize.

# Define your function just like you would
def sum_indices(x, y, z):
    return x + y + z

# Then transform it into a vectorized lambda function
f = sum_indices
fv = np.vectorize(f)

If you now do np.fromfunction(fv, (3, 3, 3)) you get this:

array([[[0., 1., 2.],
        [1., 2., 3.],
        [2., 3., 4.]],

       [[1., 2., 3.],
        [2., 3., 4.],
        [3., 4., 5.]],

       [[2., 3., 4.],
        [3., 4., 5.],
        [4., 5., 6.]]])

Is this what you wanted?

pfabri
  • 885
  • 1
  • 9
  • 25
  • "f = lambda i, j, k: sum_indices(i, j, k)" This seems to transform a function of three parameters into an identical function of three parameters. Why not just say "f = sum_indices"? (Or factor it out entirely?) – Oddthinking May 03 '20 at 17:22
  • @Oddthinking You're right. At first I tried doing that but the code failed. I assumed something was wrong with parameter unpacking without using a `lambda`. I do now realise the problem was a typo. Answer now edited to reflect your suggestion. – pfabri May 03 '20 at 17:34
1

I think it is a little confusing that most examples of fromfunction use square arrays.

Perhaps looking at a non-square array could be helpful?

def f(x,y):
    print(f'x=\n{x}')
    print(f'y=\n{y}')
    return x+y

z = np.fromfunction(f,(4,3))
print(f'z=\n{z}')

Results in:

x=
[[0 0 0]
 [1 1 1]
 [2 2 2]
 [3 3 3]]
y=
[[0 1 2]
 [0 1 2]
 [0 1 2]
 [0 1 2]]
z=
[[0 1 2]
 [1 2 3]
 [2 3 4]
 [3 4 5]]
Tony H
  • 11
  • 1
  • This question wasn't a confusion about the dimensions of the array. It was that the documentation, 7 years ago, was misleading about the number of times the `fromfunction` was called (once, not once per cell). That has been corrected in the documentation, and other answers have explained the way to implement the missing piece with `vectorize`. – Oddthinking May 13 '20 at 05:01
  • @Oddthinking - thanks! Just like Archer, sometimes I miss the core concept :) – Tony H May 13 '20 at 06:24
0

If you set parameter dtype to int you can get the desired output:

a = fromfunction(sum_of_indices, (2, 2, 2), dtype=int)

https://numpy.org/doc/stable/reference/generated/numpy.fromfunction.html

Enrique Pérez Herrero
  • 3,699
  • 2
  • 32
  • 33