0

I am new to community and please pardon me if I didn't provide information as intended.

I am trying to learn python, coming from Matlab. I have a very simple code for starting purposes:

from numpy import vectorize
from scipy import integrate
from scipy.special import j1
from math import sqrt, exp, pi, log
import matplotlib.pyplot as plt
import numpy as np



def plot_hc(radius, pd):
    q = np.linspace(0.008, 1.0, num=500)
    y = hc(q, radius, pd)
    plt.loglog(q, y)
    plt.show()

def hc_formfactor(q, radius):
    y = (1.0 / q) * (radius * j1(q * radius))
    y = y ** 2
    return y

def g_distribution(z, radius, pd):
    return (1 / (sqrt(2 * pi) * pd)) * exp(
        -((z - radius) / (sqrt(
            2) * pd)) ** 2)


def ln_distribution(z, radius, pd):
    return (1 / (sqrt(2 * pi) * pd * z / radius)) * exp(
        -(log(z / radius) / (sqrt(2) * pd)) ** 2)

# Dist=1(for G_Distribution)
# Dist=2(for LN Distribution)
Dist = 1


@vectorize
def hc(x, radius, pd):
    global d
    if Dist == 1:
        nmpts = 4
        va = radius - nmpts * pd
        vb = radius + nmpts * pd
        if va < 0:
             va = 0
        d = integrate.quad(lambda z: g_distribution(z, radius, pd), va, vb)
    elif Dist == 2:
        nmpts = 4
        va = radius - nmpts * pd
        vb = radius + nmpts * pd
        if va < 0:
            va = 0
        d = integrate.quad(lambda z: ln_distribution(z, radius, pd), va, vb)
    else:
       d = 1

   def fun(z, x, radius, pd):
        if Dist == 1:
            return hc_formfactor(x, z) * g_distribution(z, radius, pd)
        elif Dist == 2:
            return hc_formfactor(x, z) * ln_distribution(z, radius, pd)
        else:
            return hc_formfactor(x, z)

    y = integrate.quad(lambda z: fun(z, x, radius, pd), va, vb)[0]
    return y/d[0]

if __name__ == '__main__':
plot_hc(radius=40, pd=0.5)

As suggested by some, I should use for loop, but that reduced the speed even more. The code using for loop is as follows:

from numpy import vectorize
from scipy import integrate
from scipy.special import j1
from math import sqrt, exp, pi, log
import matplotlib.pyplot as plt
import numpy as np



def plot_hc(radius, pd):
    q = np.linspace(0.008, 1.0, num=500)
    y = hc(q, radius, pd)
    plt.loglog(q, y)
    plt.show()

def hc_formfactor(q, radius):
    y = (1.0 / q) * (radius * j1(q * radius))
    y = y ** 2
    return y

def g_distribution(z, radius, pd):
    return (1 / (sqrt(2 * pi) * pd)) * exp(
        -((z - radius) / (sqrt(
            2) * pd)) ** 2)


def ln_distribution(z, radius, pd):
    return (1 / (sqrt(2 * pi) * pd * z / radius)) * exp(
        -(log(z / radius) / (sqrt(2) * pd)) ** 2)

# Dist=1(for G_Distribution)
# Dist=2(for LN Distribution)
Dist = 1


def hc(q, radius, pd):
    if Dist == 1:
        nmpts = 4
        va = radius - nmpts * pd
        vb = radius + nmpts * pd
        if va < 0:
            va = 0
        d = integrate.quad(lambda z: g_distribution(z, radius, pd), va,vb)
    elif Dist == 2:
        nmpts = 4
        va = radius - nmpts * pd
        vb = radius + nmpts * pd
        if va < 0:
            va = 0
        d = integrate.quad(lambda z: ln_distribution(z, radius, pd), va, vb)
    else:
        d = 1

    def fun(z, q, radius, pd):
        if Dist == 1:
            return hc_formfactor(q, z) * g_distribution(z, radius, pd)
        elif Dist == 2:
            return hc_formfactor(q, z) * ln_distribution(z, radius, pd)
        else:
            return hc_formfactor(q, z)

    y = empty([len(q)])
    for n in range(len(q)):
        y[n] = integrate.quad(lambda z: fun(z, q[n], radius, pd), va, vb)[0]

    return y / d[0]
if __name__ == '__main__':
plot_hc(radius=40, pd=0.5)

If I run the same program for the same values in Matlab it is very very fast. I don't know what mistake I did here. Please help :)

Shankar_Dutt
  • 115
  • 9
  • What do you mean slow? – Edeki Okoh Feb 25 '20 at 22:32
  • Did you not read the [documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html)? It clearly states "The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop." – juanpa.arrivillaga Feb 25 '20 at 22:44
  • Thanks a lot, @juanpa.arrivillaga for your reply. I did read the documentation earlier, but the thing is that It became even slower when I used for loop. That's the only reason why I used Vectorization as shown in the code. I have edited my question to post the code using for loop. I may have made a mistake. – Shankar_Dutt Feb 26 '20 at 02:46
  • @EdekiOkoh, by slow I meant the time taken to evaluate the function is more compared to other programs :) – Shankar_Dutt Feb 26 '20 at 02:54
  • You can use Numba to speed up your function (easy to do, but calling overhead stays the same) or use a cfunc (can be written in C, Cython, Numba, ...) eg. https://stackoverflow.com/a/49732825/4045774 – max9111 Feb 26 '20 at 12:46
  • @juanpa, his loop is around a `scipy.quad` – hpaulj Feb 27 '20 at 16:28

1 Answers1

3

Notes

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

MATLAB has jit compilation abilities, numpy does not (numba provides some of that). To get the best numpy speed you need to think in terms of whole-array operations, just like we used to in the old MATLAB days.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks a lot for your reply. I used for loop as indicated above, it reduced the speed even further. Maybe I implemented it wrongly. Please see if that is the case. Also thanks a lot for pointing me towards jit. I will learn more about it :) – Shankar_Dutt Feb 26 '20 at 02:56
  • In my experience `vectorize` is always slower than the equivalent loop. Here you loop is deep inside `hc`, evaluating a `quad` (which only evaluates a scalar function). But you `vectorize` all of `hc`, effectively putting a loop around a much bigger calculation. Are you sure they are equivalent? – hpaulj Feb 27 '20 at 16:26