I've read other posts on how python speed/performance should be relatively unaffected by whether code being run is just in main, in a function or defined as a class attribute, but these do not explain the very large differences in performance that I see when using class vs local variables, especially when using the numpy library. To be more clear, I made an script example below.
import numpy as np
import copy
class Test:
def __init__(self, n, m):
self.X = np.random.rand(n,n,m)
self.Y = np.random.rand(n,n,m)
self.Z = np.random.rand(n,n,m)
def matmul1(self):
self.A = np.zeros(self.X.shape)
for i in range(self.X.shape[2]):
self.A[:,:,i] = self.X[:,:,i] @ self.Y[:,:,i] @ self.Z[:,:,i]
return
def matmul2(self):
self.A = np.zeros(self.X.shape)
for i in range(self.X.shape[2]):
x = copy.deepcopy(self.X[:,:,i])
y = copy.deepcopy(self.Y[:,:,i])
z = copy.deepcopy(self.Z[:,:,i])
self.A[:,:,i] = x @ y @ z
return
t1 = Test(300,100)
%%timeit
t1.matmul1()
#OUTPUT: 20.9 s ± 1.37 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
t1.matmul2()
#OUTPUT: 516 ms ± 6.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In this script I define a class with attributes X, Y and Z as 3-way arrays. I also have two function attributes (matmul1 and matmul2) which loop through the 3rd index of the arrays and matrix multiply each of the 3 slices to populate an array, A. matmul1 just loops through class variables and matrix multiplies, whereas matmul2 creates local copies for each matrix multiplication within the loop. Matmul1 is ~40X slower than matmul2. Can someone explain why this is happening? Maybe I am thinking about how to use classes incorrectly, but I also wouldn't assume that variables should be deep copied all the time. Basically, what is it about deep copying that affects my performance so significantly, and is this unavoidable when using class attributes/variables? It seems like its more than just the overhead of calling class attributes as discussed here. Any input is appreciated, thanks!
Edit: My real question is why do copies of, instead of views of subarrays of class instance variables, result in much better performance for these types of methods.