I would like to efficiently apply a complicated function to rows of a matrix in Python (EDIT: Python 3). In R this is the apply function and its cousins, and it works quickly.
In Python I understand this can be done in a few ways. List comprehension, numpy.apply_along_axis, panas.dataframe.apply.
In my coding these Python approaches are very slow. Is there another approach I should use? Or perhaps my implementation of these Python approaches is incorrect?
Here's an example. The math is taken from a probit regression model. To be clear my goal is not to execute a probit regression, I am interested in an efficient approach to applying.
In R:
> n = 100000
> p = 7
> x = matrix(rnorm(700000, 0 , 2), ncol = 7)
> beta = rep(1, p)
> start <- Sys.time()
> test <- apply(x, 1, function(t)(dnorm(sum(t*beta))*sum(t*beta)/pnorm(sum(t*beta))) )
> end <- Sys.time()
> print(end - start)
Time difference of 0.6112201 secs
In Python via comprehension:
import numpy as np
from scipy.stats import norm
import time
n = 100000
p = 7
x = norm.rvs(0, 2, n * p)
x = x.reshape( (n , p) )
beta = np.ones(p)
start = time.time()
test = [
norm.pdf(sum(x[i,]*beta))*sum(x[i,]*beta)/norm.cdf(sum(x[i,]*beta))
for i in range(100000) ]
end = time.time()
print (end - start)
23.316735982894897
In Python via pandas.dataframe.apply:
frame = DataFrame(x)
f = lambda t: norm.pdf(sum(t))*sum(t)/norm.cdf(sum(t))
start = time.time()
test = frame.apply(f, axis = 1)
end = time.time()
print(end - start)
34.39404106140137
In this question the most upvoted response points out that apply_along_axis is not for speed. So I am not including this approach.
Again, I am interested in performing these calculations quickly. I truly appreciate your help!