1

I am currently working with one Python 2.7 script with multiple threads. One of the threads is listening for JSON data in long polling mode and parse it after receiving or go into timeout after some period. I noticed that it works as expected only in debug mode (I use Wing IDE). In case of just normal run it seems like this particular thread of the script hanging after first GET request, before entering the "for" loop. Loop condition doesn't affect the result. At the same time other threads continue to work normally.

I believe this is related to multi-threading. How to properly troubleshoot and fix this issue?

Below I put code of the class responsible for long polling job.

class Listener(threading.Thread):

def __init__(self, router, *args, **kwargs):
    self.stop = False

    self._cid = kwargs.pop("cid", None)
    self._auth = kwargs.pop("auth", None)
    self._router = router
    self._c = webclient.AAHWebClient()

    threading.Thread.__init__(self, *args, **kwargs)

def run(self):
    while True:
        try:
            # Data items that should be routed to the device is retrieved by doing a 
            # long polling GET request on the "/tunnel" resource. This will block until
            # there are data items available, or the request times out
            log.info("LISTENER: Waiting for data...")

            response = self._c.send_request("GET", self._cid, auth=self._auth)

            # A timed out request will not contain any data             
            if len(response) == 0:
                log.info("LISTENER: No data this time")             
            else:
                items = response["resources"]["tunnel"]
                undeliverable = []

                #print items # - reaching this point, able to return output

                for item in items:

                    # The data items contains the data as a base64 encoded string and the 
                    # external reference ID for the device that should receive it
                    extId = item["extId"]
                    data = base64.b64decode(item["data"])

                    # Try to deliver the data to the device identified by "extId"
                    if not self._router.route(extId, data):
                        item["message"] = "Could not be routed"
                        undeliverable.append(item)

                # Data items that for some reason could not be delivered to the device should
                # be POST:ed back to the "/tunnel" resource as "undeliverable"
                if len(undeliverable) > 0:
                    log.warning("LISTENER: Sending error report...")
                    response = self._c.send_request("POST", "/tunnel", body={"undeliverable": undeliverable}, auth=self._auth)

            except webclient.RequestError as e:
                log.error("LISTENER: ERROR %d - %s", e.status, e.response)

UPD:

class Router:
def route(self, extId, data):

    log.info("ROUTER: Received data for %s: %s", extId, repr(data))
    # nothing special
    return True
pahanela
  • 15
  • 1
  • 7
  • [Dump stacktraces of all active Threads](http://stackoverflow.com/questions/1032813/dump-stacktraces-of-all-active-threads) has some great tips for figuring out where your thread is stuck. I don't see how you could know from your code whether it got stuck between the first GET and the for loop. You could log right after the get and then add a `continue` to short-circuit the loop and return to the GET. If the loop works, you could do the trick further down the function to see if one of the calls you make has a deadlock. – tdelaney Mar 06 '16 at 19:36
  • I determined "stuck" position with simple print statement (commented in source above). Tried your approach and found that thread hangs on `self._router.route(extId, data)` statement. Class `Router` doesn't contain anything particular, except SQL queries from pyodbc. However it doesn't contain `__init__` method. Can be this is the reason of issue with hanging. But I don't understand why it works in debug mode than. – pahanela Mar 07 '16 at 17:06
  • I'm not an expert with `pyodbc` but I vaguely remember that access to its connections and cursors have to be serialized. You could keep a `self._my_connection_lock = threading.Lock()` with the connection and do a `with self._my_connection_lock:` when accessing it. It is notoriously difficult to debug race conditions with a debugger because the debugger changes the execution environment. – tdelaney Mar 07 '16 at 17:27

1 Answers1

0

If you're using the CPython interpreter you're not actually system threading:

CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

So your process is probably locking while listening on the first request because your are long polling.

Multi-processing might be a better choice. I haven't tried it with long polling but the Twisted framework might also work in your situation.

Matt S
  • 14,976
  • 6
  • 57
  • 76
  • Unlikely. The GIL is released during blocking operations letting other threads run. Python threading is a perfectly good option when performing I/O bound tasks. – tdelaney Mar 06 '16 at 19:18