1

I have a function that takes in an array, performs an arbitrary calculation and returns a new shape in which it can be broadcasted. I would like to use this function in a numba.njit environment:

import numpy as np
import numba as nb

@nb.njit
def generate_target_shape(my_array):
    ### some functionality that calculates the desired target shape ###
    return tuple([2,2])
    
@nb.njit
def test():
    my_array = np.array([1,2,3,4])
    target_shape = generate_target_shape(my_array)
    reshaped = my_array.reshape(target_shape)
    print(reshaped)
test()

However, tuple creation is not supported in numba and I get the following error message when trying to cast the result of generate_target_shape to a tuple with the tuple() operator:

No implementation of function Function(<class 'tuple'>) found for signature:
 
 >>> tuple(list(int64)<iv=None>)
 
There are 2 candidate implementations:
   - Of which 2 did not match due to:
   Overload of function 'tuple': File: numba/core/typing/builtins.py: Line 572.
     With argument(s): '(list(int64)<iv=None>)':
    No match.

During: resolving callee type: Function(<class 'tuple'>

If I try to change the return type of generate_target_shape from tuple to list or np.array, I receive the following error message:

Invalid use of BoundFunction(array.reshape for array(float64, 1d, C)) with parameters (array(int64, 1d, C))

Is there a way for me to create an iterable object inside a nb.njit function that can be passed to np.reshape?

EDIT: I worked around this problem as suggested in the accepted solution by using the objmode constructor.

Yes
  • 339
  • 3
  • 19
  • Changing the return value to a tuple `return (2,2)` works in this case. But I guess this isn't the question. The size of the tuple must be known at compile time. You can use https://numba.pydata.org/numba-doc/dev/user/generated-jit.html for this purpose. But this won't help if the shape is only known at runtime. – max9111 Dec 19 '22 at 13:45

1 Answers1

3

It seems like the standard python function tuple() is not supported by numba. You can easily work around this issue by rewriting your code a litte bit:

import numpy as np
import numba as nb

@nb.njit
def generate_target_shape(my_array):
    ### some functionality that calculates the desired target shape ###
    a, b = [2, 2] # (this will also work if the list is a numpy array)
    return a, b

The general case however, is a lot trickier. I am going to backtrack on what i said in the comments: it is not possible or advisable to make a numba compiled function that works with tuples of many different sizes. Doing so would require you to recompile your function for every tuple of an unique size. @Jérôme Richard explains the problem very well in this stackoverflow answer.

What i would recommend that you do, is to simply take the array containing the shape, and your data, and calculate my_array.reshape(tuple(target_shape)) outside of your numba compiled function. It is not pretty, but it will allow you to continue with your project.

Rafnus
  • 313
  • 1
  • 14
  • I see, thanks a lot. Do you also have an idea what to do if `generate_target_shape` returns a tuple of arbitrary length, or a length depending on the size of the input array? – Yes Dec 16 '22 at 15:17
  • @Yes is there a range in which you expect the length of the tuple to be? For instance, between 2 and 8? With the way how tuples work within numba, it is gonna be very hard to write code for the true general case, where the length is arbritrary, but within a range should be doable. – Rafnus Dec 17 '22 at 11:56
  • If I wanted to have between 1 and 8 possible tuple elements, I would still have to return all eight right? And since numba doesn't support tuples that are not of strictly one type, how would I be able to distinguish between "empty" and nonempty values? – Yes Dec 19 '22 at 10:19
  • @Yes i improved my answer, it really is a shame that `reshape()` only accepts a tuple, and not an array aswell, which would make things a lot simpler – Rafnus Dec 19 '22 at 14:10
  • 2
    `reshape` does not accept lists because the size of a list is not part of its type (otherwise, it would be insane) while the dimension of an array is a part of its type, and reshape needs to be typed to return an array with a fixed dimension when it is compiled so the input shape needs to have a fixed size known at compile time, that is, typically a tuple. The fact that the number of dimensions is a part of the Numba array type is critical to generate fast code (otherwise the required conditionals+computations would introduce a significant overhead). – Jérôme Richard Dec 20 '22 at 17:30
  • 2
    In fact, since `reshaped` needs to have a number of dimension known at compile time enforce `generate_target_shape` to generate a tuple with a fixed-size number of items (which is always the case in Numba), which enforce `generate_target_shape` to be compiled for each variant. For each variant of the called function, the parent function must also generate a different code since you cannot operate on a 2D and 3D arrays the same way in general (for sake of performance and also because Numba does not provide fancy features like templates with also some downsides). – Jérôme Richard Dec 20 '22 at 17:35
  • 1
    In the end, the problem is to operate on array which the number of dimension is data-driven. This is generally not efficient. The only efficient way to deal with that is to pre-compile all the variants and do a dynamic dispatch. This is cumbersome whatever the target (native) language. While this can be done in Numba, the resulting code is generally not great: big and barely maintainable. This is the price to pay for an efficient code doing dynamic stuff like this. – Jérôme Richard Dec 20 '22 at 17:38
  • @JérômeRichard thanks for the elaborate explanation! That makes a lot of sense! Also, could i send you a private message? There is something that i want to ask you. – Rafnus Dec 20 '22 at 17:43
  • @Rafnus Ok. Do you know how to communicate in private on StackOverflow? I haven't seen such a thing yet. – Jérôme Richard Dec 21 '22 at 10:15