0

I am here with another simple question in which I really need some help. I have a list of websites and I want to go through them all with requests. However to make this faster, I want to use multi processing. How can I execute this?

Example:

import requests
import threading
from threading import Thread

list_ex = ["www.google.com","www.bing.com","www.yahoo.com"]
def list_request():
  for item in list_ex:
    ex = request.get(list_ex)
    print(ex.text)

How can I do this but with multi processing due to me having 100+ so sites :)

m02ph3u5
  • 3,022
  • 7
  • 38
  • 51
xnarf
  • 100
  • 9
  • Mukti threading in Python wont give you better performance....even could be bad performance because python not really open multi threading it just make pause to one thread run the other and come back to the first one. – jossefaz May 15 '20 at 02:58
  • Well with this it would if I could run this process on multiple threads it would provide better performance than on one thread due to it iterating more list values at one time. – xnarf May 15 '20 at 02:59
  • Did you check this out? https://stackoverflow.com/questions/38280094/python-requests-with-multithreading – JenilDave May 15 '20 at 02:59
  • Do you mean multiprocessing? – TYZ May 15 '20 at 02:59
  • Either multiprocessing or threading. – xnarf May 15 '20 at 03:00
  • @xnarf : i suggest you to read again about multi threading in python. It really not like in java where you open another independent thread. Multi threading in python have only one goal : open a non blocking action. I think that what you mean here is multi processing – jossefaz May 15 '20 at 03:01
  • Oh sorry for the mistake then, how would I multi process this then? – xnarf May 15 '20 at 03:01
  • Have a look on this code : it doing exactly what you need : https://stackoverflow.com/questions/54858979/how-to-use-multiprocessing-with-requests-module look at the function master() that @Susam Pal posted – jossefaz May 15 '20 at 03:04
  • @xnarf : when you done.Post you own answer here...;-) – jossefaz May 15 '20 at 03:11
  • Multithreading actually works here, because `requests.get` is I/O-bound, not CPU-bound. – Niklas Mertsch Oct 26 '20 at 11:42

2 Answers2

1
  • you can use DASK for multiprocessing ..

It will uses all your system cores .. while traditional python uses only one core

Dask Official ..... Dask Wikipedia

M_x
  • 782
  • 1
  • 8
  • 26
  • 1
    Using dask probably is overkill in this case, though. For simple parallelization, the standard library provides multithreading and multiprocessing facilities. – Niklas Mertsch Oct 26 '20 at 11:58
0

Multithreading is an option here, because fetching a url is not CPU-bound, but I/O-bound. Thus, a single process can have multiple threads running requests.get in parallel.

import requests
from multiprocessing.pool import ThreadPool

def read_url(url):
    return requests.get(url).text


urls = ["www.google.com","www.bing.com","www.yahoo.com"]
with ThreadPool() as pool:
    texts = pool.map(read_url, urls)

You can use multiprocessing by replacing ThreadPool with Pool. For the three provided URLs, I get similar runtimes with Pool and ThreadPool, and both are faster than running read_url sequentially in a loop.

Niklas Mertsch
  • 1,399
  • 12
  • 24