0

When I run a function, which is vectorized with numpy it is always executed once more than I would expect. Thus, before the actual calling starts, there seems to be a dry run. Recently, I run into trouble because of this. See the following minimal example:

import numpy as np

class PERSON:
    def __init__(self, age):
        self.age = age

class TIME:
    def __init__(self):
        self.ages = np.array([0,0])

    def init_persons(self):
        vec_init_persons = np.vectorize(self.__scalar_init_person)
        self.persons = vec_init_persons(self.ages)

    def __scalar_init_person(self, age):
        return PERSON(age)

    def let_time_pass(self):
        vec_let_time_pass = np.vectorize(self.__scalar_let_time_pass)
        vec_let_time_pass(self.persons)

    def __scalar_let_time_pass(self, person):
        person.age += 1

time = TIME()
time.init_persons()
time.let_time_pass()

print("Age of person 1: {}".format(time.persons[0].age)) # output is 2 not 1!
print("Age of person 2: {}".format(time.persons[1].age)) # output is 1

Normally, I would have guessed, the age of both persons is 1. So my questions are:

  1. Does anybody now the purpose of this dry run? For me I just seems to be a source of potential trouble.

  2. What is the pythonic way to deal with a problem, illustrated by the example?

  • 2
    `np.vectorize` makes one calculation to determine the `dtype` of the result. I think that's you are calling a dry-run. To avoid that use `otypes`. There may also be a `caching` option. In any case, `vectorize` is meant to be a convenience in cases where you can't use "real" numpy vectorization. It is not a performance tool. Read the docs in full. – hpaulj Mar 19 '21 at 21:37
  • Sometimes it's a good idea to have the 'vectorized' function print its argument, so you have a clearer idea of how `np.vectorize` ends up calling it. `np.vectorize` does not compile your function. It's just another way of iterating through the inputs. – hpaulj Mar 19 '21 at 21:42
  • Probably the best approach is just not to use `np.vectorize` – juanpa.arrivillaga Mar 20 '21 at 01:10

1 Answers1

1

from the docs

The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the otypes argument.

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks, hpaulj for the first comment! Using the option "cache = True" is exately what I want. At a matter of fact, I do not see any drawbacks of always using "cache = True". Do you know what's the reason for not setting the default of cache to True? – johnny1357 Mar 21 '21 at 21:07
  • It says the `cache` slows the code, doesn't it? `vectorize` is already slow. I think `otypes` is better. People have gotten bit by the auto dtype (e.g. expecting floats but getting int because the trial run returns 0). Here simple iteration on `ages` is better. – hpaulj Mar 21 '21 at 21:57