How do I make this faster with multi-processing?

Question

returns=[]
for x in range(40,80):
    url = f'https://www.mutualfundindia.com/MF/Performance/Details?id={x}'
    r = requests.get(url)
    tree = html.fromstring(r.content)
    inception = tree.xpath('//*[@id="collPerformanceAnalysis"]/div/div[3]/div[7]')
    for i in inception:
        if i.text!=' ':
            returns.append(str.strip(i.text))

This is currently taking ~60 seconds for 40 results. I saw online that I can make it faster with multiprocessing. I watched many videos but I couldnt get it to work. Please help

Parallelisation of http requests is such a common task, were you unable to find anything in your research? — timgeb, Sep 09 '21 at 10:20
Have you checked: https://stackoverflow.com/questions/57126286/fastest-parallel-requests-in-python ? — Timus, Sep 09 '21 at 10:48

score 0 · Answer 1 · answered Sep 09 '21 at 10:38

Here is a solution using multiprocessing.Pool. You can tune the cpu_count parameter further to find the sweet spot. For now, it will create processes as many as the number of available CPU cores in your machine.

import multiprocessing
# Other imports here...

returns = []


def callback(result):
    returns.extend(result)


def f(x):
    url = f'https://www.mutualfundindia.com/MF/Performance/Details?id={x}'
    r = requests.get(url)
    tree = html.fromstring(r.content)
    inception = tree.xpath('//*[@id="collPerformanceAnalysis"]/div/div[3]/div[7]')
    result = []
    for i in inception:
        if i.text != ' ':
            result.append(str.strip(i.text))
    return result


if __name__ == "__main__":
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    for i in range(40, 80):
        pool.apply_async(f, args=(i,), callback=callback)
    pool.close()
    pool.join()

    print(returns)

Hm, the problem appears to be i/o-bound, so wouldn't a `ThreadPool` or - probably better - an `asyncio`-version a better solution? — Timus, Sep 09 '21 at 10:48
@Timus yes, indeed. I provided the solution with multiprocessing since it was requested. But the API of the `threading` library is very similar, so it should be straightforward to apply the same logic with multithreading. — bugra, Sep 09 '21 at 10:53

How do I make this faster with multi-processing?

1 Answers1