1

I know there is a similar question here: Python numpy.vectorize: ValueError: Cannot construct a ufunc with more than 32 operands

But my case is different.

I have a df with 32 columns ,you can have it by running following code:

import numpy as np
import pandas as pd
from io import StringIO
dfs = """
    M0  M1  M2  M3 M4  M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30  age 
1   1   2   3    4  5   6  1  2 3    4  5  6   1   2    3  4  5    6   7   8    9 1    2  3    4  5    6  1    2   3    4   3.2        
2   7   5   4    5  8   3  1  2 3    4  5  6   1   2    3  4  5    6   7   8    9 1    2  3    4  5    6  1    2   3    4   4.5
3   4   8   9    3  5   2  1  2 3    4  5  6   1   2    3  4  5    6   7   8    9 1    2  3    4  5    6  1    2   3    4   6.7
"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s+', )
df

based on business logic I built a vectorized function, and if the total number of the parameters of function is less than 32 it works fine:

M=["M0","M1","M2","M3","M4","M5","M6","M7","M8","M9","M10","M11","M12","M13","M14","M15","M16","M17","M18","M19",
       "M20","M21","M22","M23","M24","M25","M26","M27","M28","M29"]
    
    def func2(df, M):
        return [df[i].values for i in M] 
    
    def func(age,*Ms):
        newcol=np.prod(Ms[0:age])
        return newcol
    
    vfunc = np.frompyfunc(func, len(M)+1, 1)
    
    df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))

For easy understanding,func2 is just make the code more clean,it generates all the parameters for func,without func2 the code will looks like:

def func(age,M0,M1,M2,...,M29):
    newcol=np.prod(Ms[0:age])
    return newcol

vfunc = np.frompyfunc(func, 31, 1)

df['newcol']=vfunc(df['age'].values.astype(int), df['M1'].values,...,df['M29'].values)

The real problem is once the number of parameters is equal or larger than 32 like this:

M=["M0","M1","M2","M3","M4","M5","M6","M7","M8","M9","M10","M11","M12","M13","M14","M15","M16","M17","M18","M19",
           "M20","M21","M22","M23","M24","M25","M26","M27","M28","M29","M30"] # M30 is the only difference from the above function
        
        def func2(df, M):
            return [df[i].values for i in M] 
        
        def func(age,*Ms):
            newcol=np.prod(Ms[0:age])
            return newcol
        
        vfunc = np.frompyfunc(func, len(M)+1, 1)
        
        df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))

I received error:

ValueError                                Traceback (most recent call last)
<ipython-input-66-9a042ad44f9b> in <module>()
     76     return newcol
     77 
---> 78 vfunc = np.frompyfunc(func, len(M)+1, 1)
     79 
     80 df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))

ValueError: Cannot construct a ufunc with more than 32 operands (requested number were: inputs = 32 and outputs = 1)

In my real business logic I have more than 100 columns need use np.pro to calculate, so this really stuck me. Any friend can help?

William
  • 3,724
  • 9
  • 43
  • 76
  • 1
    [This](https://stackoverflow.com/a/63093878/12664040) may answer your question? – jezza_99 Dec 02 '21 at 01:48
  • Full traceback please! – hpaulj Dec 02 '21 at 01:49
  • Updated my question and pated full traceback. – William Dec 02 '21 at 01:56
  • @jezza_99 no,thank you for your reply. – William Dec 02 '21 at 01:56
  • Sorry, what is your end goal? Do you mind posting the expected output dataframe? – sammywemmy Dec 02 '21 at 02:04
  • I'm not sure I understand your code. But why `np.vectorize`? Cant you not do `df[M].prod(axis=1)`? Or even something like a `for` loop? – Quang Hoang Dec 02 '21 at 02:52
  • note that `np.vectorize` does not make your function "vectorized" in the sense you're probably referring to - np.vectorize is a convenience function that essentially runs a for loop over the array elements. The vectorized solution you seem to be trying to achive should use pandas operations such as df.prod as suggested by @QuangHoang – Michael Delgado Dec 02 '21 at 02:57
  • @Ben.T Thank you for your reply ,can you post it as an answer ? – William Dec 02 '21 at 03:16
  • @QuangHoang since I need apply the logic to each row,and if I use pd.apply() it will be very slow,so I need use numpy,can you post your reply as an answer ,thank you so much – William Dec 02 '21 at 03:17
  • @William my point is `np.vectorize` is not faster than either `pd.apply` or a `for` loop. As for solution, @Ben.T's comment is perfect for your case (sans some corner cases). I did not see his comment when I put down my first one. – Quang Hoang Dec 02 '21 at 03:20
  • As I explained in the linked SO, you cannot use `frompyfunc` (or `np.vectorize`) with a large number of arguments. This 32 limit is the max number of dimensions of an array. Most likely you are using it wrong, for something it wasn't designed for, and where it has **no** speed benefits. – hpaulj Dec 02 '21 at 03:22
  • 1
    @QuangHoang, apparently last year I found that `np.vectorize` is faster than pandas `apply` or row iterate.But that's because pandas is so slow with all of its indexing baggage. I didn't test `apply` with its `raw` mode which bypasses a lot of that. – hpaulj Dec 02 '21 at 03:39
  • How is it different from what Ben proposed in his answer here? – Quang Hoang Dec 03 '21 at 17:19
  • @QuangHoang it not work, received errors ,my new question just make my question more straightforward hopefully ,thank you very much for your reply – William Dec 03 '21 at 17:21
  • I suggest you be more specific than *it not work*. What is the error? Does it give different output than what you expect (where, and why the difference)? – Quang Hoang Dec 03 '21 at 17:24
  • @QuangHoang I updated my pandas version ,it works now!Thank you all! – William Dec 03 '21 at 17:29

1 Answers1

2

Here is a way to achieve your result. Select all the M columns with filter, use where to replace by nan all the values that the column position is higher than the age column, then prod along the columns.

df['newcol'] = (
     # keep only Mx columns
    df.filter(like='M')
      # keep only the values when the position of the column
      # is less than the age
      .where(lambda x: (np.arange(x.shape[1])+1)<df['age'].to_numpy()[:, None])
      # multiply all the non-nan values per row
      .prod(axis=1)
)
print(df)
Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • @QuangHoang after updated my pandas version it works now,thanks! – William Dec 03 '21 at 17:30
  • What if instead of using MO ...M30,I need use 1-M0..1-M30 ?I tried 1-(np.arange(x.shape[1])+1) and (np.arange(1-x.shape[1])+1),but receives errors – William Dec 03 '21 at 18:22
  • @William if you mean the name of the columns starts with 1-M? then in the filter the parameter `like='1-M'` should select the columns – Ben.T Dec 03 '21 at 18:28
  • Thank you for your reply not the name is the values should be 1-df['M0]...1-df['M30'],sorry for the confuse – William Dec 03 '21 at 18:30
  • So in your answer we are using the value of df['M0']...df['M30'] to do calculation ,what if we need use the value of 1-df['M0']..1-df['M30'] ? – William Dec 03 '21 at 18:32
  • @William maybe you can do `(1 - df.filter(like='M')).where(...` the rest the same – Ben.T Dec 03 '21 at 18:38
  • 1
    it works really appreciate – William Dec 03 '21 at 21:01