An attempt to use numpy.vectorize with a lot of inputs and outputs arguments generates an error:
import pandas as pd
import numpy as np
df = pd.DataFrame([[0] * 20], columns=
['a01', 'b02', 'c03', 'd04', 'e05', 'f06', 'g07', 'h08', 'i09', 'j10',
'k11', 'l12', 'n13', 'n14', 'o15', 'p16', 'q17', 'r18', 's19', 't20'])
def func(a01, b02, c03, d04, e05, f06, g07, h08, i09, j10,
k11, l12, n13, n14, o15, p16, q17, r18, s19, t20):
# ... some complex logic here, if, for loops and so on
return (a01, b02, c03, d04, e05, f06, g07, h08, i09, j10,
k11, l12, n13, n14, o15, p16, q17, r18, s19, t20)
df['a21'], df['b22'], df['c23'], df['d24'], df['e25'], df['f26'], df['g27'], df['h28'], df['i29'], df['j30'], \
df['k31'], df['l32'], df['n33'], df['n34'], df['o35'], df['p36'], df['q37'], df['r38'], df['s39'], df['t40'], \
= np.vectorize(func)(
df['a01'], df['b02'], df['c03'], df['d04'], df['e05'], df['f06'], df['g07'], df['h08'], df['i09'], df['j10'],
df['k11'], df['l12'], df['n13'], df['n14'], df['o15'], df['p16'], df['q17'], df['r18'], df['s19'], df['t20'])
Traceback (most recent call last):
File "ufunc.py", line 18, in <module>
= np.vectorize(func)(
File "C:\Python\3.8.3\lib\site-packages\numpy\lib\function_base.py", line 2108, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\Python\3.8.3\lib\site-packages\numpy\lib\function_base.py", line 2186, in _vectorize_call
ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
File "C:\Python\3.8.3\lib\site-packages\numpy\lib\function_base.py", line 2175, in _get_ufunc_and_otypes
ufunc = frompyfunc(_func, len(args), nout)
ValueError: Cannot construct a ufunc with more than 32 operands (requested number were: inputs = 20 and outputs = 20)
Note. The code is a simplification of the generated code. An actual number of rows would be in millions. Columns names do not have any regular structure. I choose the names of the columns to make counting easier.
Any suggestions on how to restructure the code while keeping the performance benefits of numpy.vectorize? I found that np.vectorize is much faster than "apply" or passing Series as input and output.
Thank you.