0

I have a function that i want to process for thousands of objects in parallel.

EDIT:

Using Pool from: https://docs.python.org/3/library/multiprocessing.html
Using Client object from: https://github.com/ping/instagram_private_api

from multiprocessing import Pool
from instagram_private_api import Client
from database.models import Account # SQLAlchemy model

# MAIN FUNCTION
accounts = get_all_accounts() # get accounts from DB
p = Pool(len(accounts)) # len of accounts is ~100
results = p.map(login, accounts)
p.terminate()
p.join()

# LOGIN FUNCTION 
def login(acc: Account):
    client = Client(proxy=acc.proxy)
    client = Client(username=acc.username, password=acc.password)
    client.login() # request is sent here

I am using the Pool object from the multiprocessing library and inside the function, i have two GET requests to external API ( note: I'm not using requests or urllib , i'm using a third-party library for sending the requests).
However for a lot of the requests I get the URLError <urlopen error timed out> error when i use Pool(100)( 100 parallel processes ) , if i use less ( i.e. 10 ) i dont get any errors.

Any idea on how to overcome this issue if it is possible?

lcadc17
  • 195
  • 1
  • 16
  • Can you post a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example)? – Booboo Jun 07 '21 at 16:54
  • I've edited my initial question whit a sample code of my script – lcadc17 Jun 07 '21 at 17:23
  • This is still not quite *minimal*: I don't see the `import` statements and there are two different API's each of which define `Client` and neither of which seems to have a `login` method. Anyway, I don't think either uses the `requests` or `urllib` packages (which is a Python2 package) for which you have tagged this question. Also, I would think that perhaps multithreading would be the better way to go if this program is basically just retrieving data from a remote site over a socket. Try `from multiprocessing.dummy import Pool` to multithread instead. But I look forward to your update. – Booboo Jun 07 '21 at 18:06
  • I've added the imports for `Account` (db model) , `Pool` (from multiprocessing) and `Client` from the instagram library i've posted. I've tested with the dummy Pool before, it gives same result. Also i'm pretty sure it uses `urllib` ( check the code for the `Client` object https://github.com/ping/instagram_private_api/blob/master/instagram_private_api/client.py ) – lcadc17 Jun 08 '21 at 07:19
  • Also the main things that happen in my function are networking and database I/O. I assume `dummy.Pool` would be a better choice but i still get the error if use `Pool(100)` . Is there any limit on that parameter and why are the requests timing out ? – lcadc17 Jun 08 '21 at 08:32
  • I've tried using different library for the IG requests ( https://github.com/adw0rd/instagrapi ) with the same scenario, but a similar error occurs `ProxyError HTTPSConnectionPool(host='i.instagram.com', port=443): Max retries exceeded with url: /api/v1/accounts/contact_point_prefill/ (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError(': Failed to establish a new connection: [Errno 110] Connection timed out',)))` – lcadc17 Jun 08 '21 at 11:06
  • Is it possible that Instagram itself is limiting the number of requests per second or minute from the same IP address? It does impose hourly limits; see [Did Instagram change API rate limits on Mar 30, 2018?](https://stackoverflow.com/questions/49583489/did-instagram-change-api-rate-limits-on-mar-30-2018). – Booboo Jun 08 '21 at 11:30
  • I'm faimiliar of IG and their rate limits ( i've missed one line where i set the proxy `client = Client(proxy=acc.proxy)` , i've added it in the code above ). Which means im sending all of the requests from different proxies. I'm not sure why when my Pool length is lets say 10 it works correctly but otherwise it fails with those errors. Do you think it is python related or IG related ? – lcadc17 Jun 09 '21 at 07:32
  • I can't really say. But there is a 3rd possibility: the proxy. – Booboo Jun 09 '21 at 10:11
  • I think i found a solution ( i'm not sure if it is IG or requests related, but i've updated the instagram library that i'm using with adding ) `self.private = requests.Session() retries = Retry(total=3, backoff_factor=0.1) self.private.mount('https://', HTTPAdapter(pool_maxsize=130, max_retries=retries)) `. – lcadc17 Jun 09 '21 at 10:57
  • Basically i've added `HTTPAdapter` to the requests Session and i've added `Retry` object with count 3 ( repeat the request 3 times if it fails ). Now i don't get the error and the accounts authenticate correctly. I will test it more and write the solution if it works as an answer ( since i cant write it properly in comments ) – lcadc17 Jun 09 '21 at 11:00

0 Answers0