python numpy vectorize an array of object instances

Question

I'd like to encapsulate my calc function and all its parameters inside an object, but vectorize the execution for millions of objects much like how numpy would do it. Any suggestions?

the calculation is still basic arithmetic which numpy should be able to vectorize.

Example code:

import numpy as np
myarray = np.random.rand(3, 10000000)

############################# This works fine: FAST ###################################

def calc(a,b,c):
    return (a+b/c)**b/a


res1 = calc(*myarray)  #0.7 seconds

############################# What I'd like to do (unsuccessfully): SLOW ###################################

class MyClass():
    __slots__ = ['a','b','c']

    def __init__(self, a,b,c):
        self.a, self.b, self.c = a,b,c

    def calc(self):
        return (self.a + self.b / self.c) ** self.b / self.a 

def classCalc(myClass:MyClass):
    return myClass.calc()

vectorizedClassCalc = np.vectorize(classCalc)
myobjects = np.array([MyClass(*args) for args in myarray.transpose()])


res2 = vectorizedClassCalc(myobjects) #8 seconds no different from a list comprehension
res3 = [obj.calc() for obj in myobjects] #7.5 seconds

perhaps pandas has additional features?

This creates a lot of overhead because now Python has to deal with millions of different instances. You really should redesign you solution as I don't think, this will get much faster — user8408080, Nov 13 '18 at 14:29
Your array will consist of pointers to the objects elsewhere in memory, much like a list. Iteration over such an array is like a list comprehension,and no where as fast as the compiled numpy code for numeric dtypes. Search for `np.frompyfunc` for past discussions on this topic. — hpaulj, Nov 13 '18 at 14:46
hpaulj is pretty spot on as to why you're seeing a slow down: good access patterns over data and contiguous memory are a very large part of why numpy is so fast. Another thing to consider is *why* you want to use OOP. The function that you have looks perfectly reasonable. A lot of scientific code (especially `numpy`/`scipy` heavy code) simply doesn't use OOP, or uses it sparingly when it's convenient or the right tool for the job. OOP is a tool: nothing more. But as an aside, consider just leaving your `calc` function as is and just pass it in your array: it looks fine as-is. — Matt Messersmith, Nov 13 '18 at 15:13
Are you not making a single object that contains the big arrays because you're not sure how many instances you have? — anishtain4, Nov 13 '18 at 15:20
An example from a couple of weeks ago: https://stackoverflow.com/q/53034280/901925 — hpaulj, Nov 13 '18 at 15:32
I'm a big advocate of not forcing oop onto everything especially for reporting purposes. However in my usecase: The objects are used in a wider simulation environment e.g. financial model backtesting and scenario stress testing , where modelling a universe with each asset and component as objects is meaningful. I merely wish to create a reporting layer on top of this environment and don't wish to reimplement these calc() functions which are encapsulated in these objects — user1441053, Nov 14 '18 at 17:47

python numpy vectorize an array of object instances

0 Answers0