I wrote a program for calculating certain polynomials in python and it's reasonably fast, but I ran cProfile on it and the results are disturbing. The specific run takes 296 seconds, which is fine, but the cumulative time spent in abc.py __instancecheck__
is 43 seconds. This is really pointing to writing it in something other than python, especially since there's a calculation I want to run that, with the current code, would take 50 days.
ncalls tottime percall cumtime percall filename:lineno(function)
201/1 0.001 0.000 296.104 296.104 {built-in method builtins.exec}
1 0.000 0.000 296.104 296.104 double_samuel_schubmult7.py:1(<module>)
1 85.132 85.132 295.717 295.717 double_samuel_schubmult7.py:205(schubmult)
72618 34.608 0.000 94.825 0.001 double_samuel_schubmult7.py:247(<listcomp>)
46620756 19.717 0.000 60.217 0.000 double_samuel_schubmult7.py:171(elem_sym_func)
1 0.002 0.002 54.962 54.962 parallel.py:1000(__call__)
1 0.584 0.584 54.886 54.886 parallel.py:960(retrieve)
12039 0.013 0.000 54.181 0.005 pool.py:767(get)
12039 0.008 0.000 54.160 0.004 pool.py:764(wait)
12054 0.018 0.000 54.156 0.004 threading.py:589(wait)
768 0.010 0.000 54.119 0.070 threading.py:288(wait)
3126 54.105 0.017 54.105 0.017 {method 'acquire' of '_thread.lock' objects}
58143528 7.986 0.000 43.820 0.000 abc.py:117(__instancecheck__)
58143528 12.915 0.000 35.834 0.000 {built-in method _abc._abc_instancecheck}
8188300/3054264 30.769 0.000 30.769 0.000 double_samuel_schubmult7.py:148(elem_sym_poly)
58143557/58143542 7.913 0.000 22.918 0.000 abc.py:121(__subclasscheck__)
58143557/58143542 15.006 0.000 15.006 0.000 {built-in method _abc._abc_subclasscheck}
The threading portion is for a small part of the code at the end that is unproblematic, most of the code is single-threaded.
Does this 43 seconds of abc __instancecheck__
time spent really mean that if I write it in, say, C, it will be at least 43 seconds faster? Is there a way to suppress it?
I should note that for polynomial calculations I'm using symengine, which could be where this is happening, or numpy. Below is the main function (schubmult).
from symengine import *
import numpy as np
..200 lines of omitted code..
def schubmult(perm_dict,v):
vn1 = inverse(v)
th = theta(vn1)
if th[0]==0:
return perm_dict
mu = permtrim(uncode(th))
vmu = permtrim(mulperm(list(v),mu))
inv_vmu = inv(vmu)
inv_mu = inv(mu)
ret_dict = {}
vpaths = [([(vmu,0)],1)]
while th[-1] == 0:
th.pop()
for i in range(len(th)):
k = i+1
vpaths2 = []
for path,s in vpaths:
last_perm = path[-1][0]
newperms = kdown_perms(last_perm,th[i],k)
for new_perm,s2,vdiff in newperms:
new_perm2 = permtrim(new_perm)
if i == len(th)-1 and (len(new_perm2) != 2 or new_perm2[0]!=1):
continue
path2 = [*path,(new_perm2,vdiff)]
vpaths2 += [(path2,s*s2)]
vpaths = vpaths2
arr0 = [0 for vpath in vpaths]
for u,val in perm_dict.items():
inv_u = inv(u)
vpathsums = {u: val*np.array([vpath[1] for vpath in vpaths])}
for index in range(len(th)):
newpathsums = {}
for up, arr in vpathsums.items():
inv_up = inv(up)
newperms = elem_sym_perms(up,min(th[index],(inv_mu-(inv_up-inv_u))-inv_vmu),th[index])
for up2, udiff in newperms:
newpathsums[up2] = newpathsums.get(up2,np.array(arr0))+arr*[elem_sym_func(th[index],index+1,up,up2,vpaths[i][0][index][0],vpaths[i][0][index+1][0],udiff,vpaths[i][0][index+1][1],var2,var3) for i in range(len(vpaths))]
vpathsums = newpathsums
if len(vpaths)<300:
ret_dict = add_perm_dict({ep: np.sum(arr) for ep,arr in vpathsums.items()},ret_dict)
else:
ret_dict = add_perm_dict(dict(Parallel(n_jobs=-1,require='sharedmem')(delayed(pairsum)(ep,arr) for ep,arr in vpathsums.items())),ret_dict)
return ret_dict