-4
returns=[]
for x in range(40,80):
    url = f'https://www.mutualfundindia.com/MF/Performance/Details?id={x}'
    r = requests.get(url)
    tree = html.fromstring(r.content)
    inception = tree.xpath('//*[@id="collPerformanceAnalysis"]/div/div[3]/div[7]')
    for i in inception:
        if i.text!=' ':
            returns.append(str.strip(i.text))    

This is currently taking ~60 seconds for 40 results. I saw online that I can make it faster with multiprocessing. I watched many videos but I couldnt get it to work. Please help

1 Answers1

0

Here is a solution using multiprocessing.Pool. You can tune the cpu_count parameter further to find the sweet spot. For now, it will create processes as many as the number of available CPU cores in your machine.

import multiprocessing
# Other imports here...

returns = []


def callback(result):
    returns.extend(result)


def f(x):
    url = f'https://www.mutualfundindia.com/MF/Performance/Details?id={x}'
    r = requests.get(url)
    tree = html.fromstring(r.content)
    inception = tree.xpath('//*[@id="collPerformanceAnalysis"]/div/div[3]/div[7]')
    result = []
    for i in inception:
        if i.text != ' ':
            result.append(str.strip(i.text))
    return result


if __name__ == "__main__":
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    for i in range(40, 80):
        pool.apply_async(f, args=(i,), callback=callback)
    pool.close()
    pool.join()

    print(returns)

bugra
  • 129
  • 3
  • Hm, the problem appears to be i/o-bound, so wouldn't a `ThreadPool` or - probably better - an `asyncio`-version a better solution? – Timus Sep 09 '21 at 10:48
  • @Timus yes, indeed. I provided the solution with multiprocessing since it was requested. But the API of the `threading` library is very similar, so it should be straightforward to apply the same logic with multithreading. – bugra Sep 09 '21 at 10:53