1

When trying to send a large number of requests (>500), I get a connection error like so:

ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))

The code:

   def compare_and_fix(mac, reqSession):

        url = hosturl + <api as a string> + mac

        try:
                resp = json.loads(reqSession.get(url, headers=headers, cert=(…), timeout=10, verify=False).text)
                response_code = resp['status']
                if response_code == 200:
                        result = resp['data']
                else:
                        print “oo-ooh!”
        except Exception as ex:
                print ex.message


    def worker(reqSession):
            while True:
                    mac = q.get()
                    if validate_mac(mac):
                            compare_and_fix(mac, reqSession)
                    q.task_done()

    num_worker_threads=500
    q = Queue()

    for i in range(num_worker_threads):
            session = requests.session().client()
            t = Thread(target=worker, args=(session,))
            t.daemon = True
            t.start()

    for mac in mac_list:
            q.put(mac)
    print "Waiting for threads to join"                                                                                                                                                                 
    q.join()

This works well if num_worker_threads is low ~100-200. Anything more, and I start to see the above error. If I go more than 500, I see it happen quite a bit (~15% of all calls fail with this error at this point).

Trace:

    exception : Traceback (most recent call last):   
        File “xyz.py", line 62, in abc
                resp = json.loads(reqSession.get(url, headers=headers, cert=(…), timeout=10, verify=False).text)   
        File "/Library/Python/2.7/site-packages/requests/sessions.py", line 476, in get
                return self.request('GET', url, **kwargs)   
        File "/Library/Python/2.7/site-packages/requests/sessions.py", line 464, in request
                resp = self.send(prep, **send_kwargs)   
        File "/Library/Python/2.7/site-packages/requests/sessions.py", line 576, in send
                r = adapter.send(request, **kwargs)   
        File "/Library/Python/2.7/site-packages/requests/adapters.py", line 415, in send 
    ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))

Any insight into this issue is appreciated. Thanks!

nimblerex
  • 49
  • 1
  • 6
  • Why isn't the error message related to the code? (At least the relevant part could be pasted and replace the URL etc if it's sensitive) – Torxed Oct 25 '17 at 06:39
  • Sorry, but without the actual code it is not possible to replicate the error. – Klaus D. Oct 25 '17 at 06:40
  • Seeing as you're bordering the value of what's normal thread management in Windows, it might be a thread related issue where you're hitting the maximum allowed stack space in Windows. On linux it's something like `ram/cores/10`, roughly around ~500. Read more here: https://stackoverflow.com/a/481919/929999 – Torxed Oct 25 '17 at 06:42
  • That error sounds like a hostname resolution error. Can you double check the IPs or hostnames that you are giving your client? One of them might be dodgy and pointing to nowhere – Will Oct 25 '17 at 06:56
  • @Torxed I just updated the code. – nimblerex Oct 25 '17 at 06:58
  • I am using my mac to run this (OS X). @Will, I would have guessed the same but it is a static hostname, never changes and 85% of the times it works. – nimblerex Oct 25 '17 at 06:59
  • OK, so 'hosturl' is a machine other than your localhost? Just for giggles, try using IP instead of hostname (and remove DNS/bind from the picture). If that works then we know it's got something to do with name resolution backlog. Failing that, increase timeout... – Will Oct 25 '17 at 07:05
  • @Will Actually, the hosturl is a load balancer benchmarked for ~5K rps, so I was guessing that's not an issue. Unfortunately, I cannot reach an individual IP/host under the load balancer. Not sure if the timeout would help. The 99 percentile duration for calls is about 1.5 secs. – nimblerex Oct 25 '17 at 07:10
  • Then I'm barking up the wrong tree and the problem is more likely to do with thread count. Have you tried doing this with async calls? – Will Oct 25 '17 at 07:22
  • 1
    Yea best guess is it's a thread count issue. My link was regarding a Windows machine, but all machines have the same limitations essentially, just handled in different ways. What does `ulimit -u` show (run as user you're executing under). Or it could actually be what @Will was pointing towards. Might be the DNS server that's throttling you because the same source IP is hammering the loadbalancer/DNS server. – Torxed Oct 25 '17 at 08:36
  • @Torxed You nailed it, ulimit was the issue. It took me a while to bump it up (709 to 2048). It is not that straightforward in OS X. I went way over 500 threads and zero issues now. I cannot mark a comment as an answer, can I? – nimblerex Oct 25 '17 at 09:34
  • @nimblerex No unfortunately you can't, seeing as I marked it as a duplicate before even knowing the OS and people jumped on the bandwagon.. I gotta reopen it to write an answer as well (can't write answers on closed topics, heh). Not sure if I'm allowed as a moderator to do so :P – Torxed Oct 25 '17 at 15:03
  • @Torxed As long as people can get this info when they search for it, I'm good (I think it is easier to look at an answer rather than sift through the comments to know what the answer is). The other downside to it is, you not getting credit for this. As this is not a duplicate, I think you should reopen it. But I'm new to all this, so your call. – nimblerex Oct 26 '17 at 06:51

0 Answers0