46

What is the difference between vectorize and frompyfunc in numpy?

Both seem very similar. What is a typical use case for each of them?

Edit: As JoshAdel indicates, the class vectorize seems to be built upon frompyfunc. (see the source). It is still unclear to me whether frompyfunc may have any use case that is not covered by vectorize...

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
Olivier Verdier
  • 46,998
  • 29
  • 98
  • 90
  • Any numpy developers out there who can clear this up? Numpy has many of these situations where there were higher and lower level implementations without a pointer between them in the docs. – dtlussier Nov 24 '11 at 18:07
  • For some secret reason, `frompyfunc` produces functions that consciously disregard the `dtype` argument and return an array of `object`s. As the documentation explains, "The returned ufunc always returns PyObject arrays". There is an easy and ingenious workaround: submit an array of desired type as `out` argument. The `vectorize` function, on the contrary, allows to specify the output type of the ufunc with `otypes` argument, but it is supposed to be slow and hence fairly useless, compared to using nested lists. – Alexey Mar 14 '16 at 16:32
  • If anyone wants to take the speed arguments further, there's [this](https://stackoverflow.com/questions/57253839/why-vectorize-is-outperformed-by-frompyfunc/74596788#74596788) Q/A, that offers 1000x speedup. – Neil_UK Nov 28 '22 at 06:39

3 Answers3

28

As JoshAdel points out, vectorize wraps frompyfunc. Vectorize adds extra features:

  • Copies the docstring from the original function
  • Allows you to exclude an argument from broadcasting rules.
  • Returns an array of the correct dtype instead of dtype=object

Edit: After some brief benchmarking, I find that vectorize is significantly slower (~50%) than frompyfunc for large arrays. If performance is critical in your application, benchmark your use-case first.

`

>>> a = numpy.indices((3,3)).sum(0)

>>> print a, a.dtype
[[0 1 2]
 [1 2 3]
 [2 3 4]] int32

>>> def f(x,y):
    """Returns 2 times x plus y"""
    return 2*x+y

>>> f_vectorize = numpy.vectorize(f)

>>> f_frompyfunc = numpy.frompyfunc(f, 2, 1)
>>> f_vectorize.__doc__
'Returns 2 times x plus y'

>>> f_frompyfunc.__doc__
'f (vectorized)(x1, x2[, out])\n\ndynamic ufunc based on a python function'

>>> f_vectorize(a,2)
array([[ 2,  4,  6],
       [ 4,  6,  8],
       [ 6,  8, 10]])

>>> f_frompyfunc(a,2)
array([[2, 4, 6],
       [4, 6, 8],
       [6, 8, 10]], dtype=object)

`

Stuart Berg
  • 17,026
  • 12
  • 67
  • 99
10

I'm not sure what the different use cases for each is, but if you look at the source code (/numpy/lib/function_base.py), you'll see that vectorize wraps frompyfunc. My reading of the code is mostly that vectorize is doing proper handling of the input arguments. There might be particular instances where you would prefer one vs the other, but it would seem that frompyfunc is just a lower level instance of vectorize.

JoshAdel
  • 66,734
  • 27
  • 141
  • 140
  • 3
    I agree with you that `frompyfunc` seems to be lower level than `vectorize`. The question remains, though, of whether there are cases where you would prefer to use `frompyfunc` instead of `vectorize`? – Olivier Verdier Jul 21 '11 at 05:37
  • 1
    I think you cannot use `.accumulate` in `vectorize` like https://stackoverflow.com/a/27912352/3226167 – user3226167 Jun 19 '18 at 10:05
6

Although both methods provide you a way to build your own ufunc, numpy.frompyfunc method always returns a python object, while you could specify a return type when using numpy.vectorize method

utada219
  • 71
  • 1
  • 1