7

After running for some time, our Twisted server application accepts connections but will not read any data and eventually hangs.

We saw the following errors in our logs:

Unhandled Error
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/python/log.py", line 69, in callWithContext
        return context.call({ILogContext: newCtx}, func, *args, **kw)
      File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/python/context.py", line 118, in callWithContext
        return self.currentContext().callWithContext(ctx, func, *args, **kw)
      File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/python/context.py", line 81, in callWithContext
        return func(*args,**kw)
      File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/posixbase.py", line 614, in _doReadOrWrite
        why = selectable.doRead()
        --- <exception caught here> ---
      File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/tcp.py", line 1016, in doRead
        transport = self.transport(skt, protocol, addr, self, s, self.reactor)
      File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/tcp.py", line 773, in __init__
        self.startReading()
      File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/abstract.py", line 416, in startReading
        self.reactor.addReader(self)
      File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/epollreactor.py", line 254, in addReader
        _epoll.EPOLLIN, _epoll.EPOLLOUT)
      File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/epollreactor.py", line 238, in _add
        self._poller.modify(fd, flags)
    exceptions.IOError: [Errno 2] No such file or directory

We may also see KeyError: 13 or sometimes Bad file descriptor from deep in the reactor, often in reactor.addReader, reactor.addWriter or their equivalents that remove readers or writers.

What might be causing this?

Chris Withers
  • 10,837
  • 4
  • 33
  • 51
allen
  • 79
  • 2

2 Answers2

3

Although a very old question, this does come up still as the top answer to searches for this problem on Google...so I thought I would add a more complete answer.

Check whether you are using threads, particularly deferToThread; If you are, then make sure you are not interacting with the reactor except using either reactor.callFromThread or, especially if your non-threaded code may raise a exception or return a result, blockingCallFromThread.

You may find you have ended up doing this when writing data to a transport from a thread, such as sending a websocket message or an HTTP response.

For full details, see the Twisted documentation on using threads.

Getting this wrong means that you might get correct behavior… or you might get the errors described in the question, or you may get hangs, crashes, or corrupted data. So don’t do it.

We encountered this issue intermittently in our codebase because we were using threads to send data via Twisted which is not allowed. Specifically, we occasionally found two threads invoking EPollReactor._add/EPollReactor._remove calls simultaneously - which lead to EPollReactor's internal _reads/_writes/_selectables structures ending up in an inconsistent state. Symptoms included:

  • "Reactor was unclean" during shutdown in trial-based test suites

  • ENOENT errors when calling self._poller.modify()

  • KeyError: {int} for some file descriptors from deep within reactor code.

Chris Withers
  • 10,837
  • 4
  • 33
  • 51
-1

Looks like a Twisted bug to me: we just experienced this after upgrading a rock stable server that has been running on 10.0.0 for over a year to 12.3.0: it happened after only two days after the upgrade.

mercador
  • 753
  • 2
  • 8
  • 14