1

I'm working on a simple Flask app that will eventually turn into a simple REST API for doing named entity recognition using spaCy on a given text string. I have a simple prototype as follows:

from flask import Flask, render_template, request, json
import spacy
from spacy import displacy

def to_json(doc):
        return [
                {
                'start': ent.start_char,
                'end': ent.end_char,
                'type': ent.label_,
                'text': str(ent),
                } for ent in doc.ents
                ]

nlp = spacy.load('en')

app = Flask(__name__)

@app.route('/')
def index():
        return render_template('index.html')

@app.route('/demo', methods=['GET', 'POST'])
def demo():
        q = request.values.get('text')
        doc = nlp(q)

        if request.values.get('type') == 'html':
                return displacy.render(doc, style='ent', page=True)
        else:
                return app.response_class(
                                response=json.dumps(to_json(doc), indent=4),
                                status=200,
                                mimetype='text/string'
                                )

if __name__ == '__main__':
     app.run(host='0.0.0.0')

The Flask app is served using an Apache webserver on Ubuntu. I submit text to the app using a simple web form and it returns results as either HTML or JSON text.

The problem I am having is that the app hangs intermittently...I can't figure out a pattern of what causes it to hang. Nothing shows up in the Apache error log, and the request that hangs does not appear in the Apache access log. If I kill the server while the browser is spinning, the browser reports that the server provided an empty response. If I restart the server, the error log reports that 1 or 2 child processes don't exit after a SIGTERM, and a SIGKILL has to be sent.

One possible clue is that the error log reports the following when the server starts up:

[Wed Dec 06 20:19:33.753041 2017] [wsgi:warn] [pid 1822:tid 140029812619136] mod_wsgi: Compiled for Python/2.7.11.
[Wed Dec 06 20:19:33.753055 2017] [wsgi:warn] [pid 1822:tid 140029812619136] mod_wsgi: Runtime using Python/2.7.12.

Another possible clue is that the "index" route (/) never seems to hang. But the "/demo" route can hang for both branches of the request.values.get('type') == 'html' if statement.

EDIT: I've taken Apache and mod_wsgi out of the loop, and am now running the app using the standalone Flask server. The app still hangs occasionally...when it does, I can press control-c and it consistently returns the following as the most recent code:

Exception happened during processing of request from ('xxx.xxx.xxx.xxx', 55608)
Traceback (most recent call last):
  File "/usr/lib/python2.7/SocketServer.py", line 290, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 318, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 331, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.7/SocketServer.py", line 652, in __init__
    self.handle()
  File "/usr/local/lib/python2.7/dist-packages/werkzeug/serving.py", line 232, in handle
    rv = BaseHTTPRequestHandler.handle(self)
  File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
    self.handle_one_request()
  File "/usr/local/lib/python2.7/dist-packages/werkzeug/serving.py", line 263, in handle_one_request
    self.raw_requestline = self.rfile.readline()
  File "/usr/lib/python2.7/socket.py", line 451, in readline
    data = self._sock.recv(self._rbufsize)
KeyboardInterrupt
----------------------------------------

After pressing control-c, Flask gets "released" and then returns the result I expect. The server continues on as normal and will accept more requests until it hangs again. Sometimes a hung request will come back on its own if I wait long enough.

This seems more and more like it's a problem with Flask (or how I'm using it). If anyone can provide advice on how to track down the problem, I would appreciate it!

BriWill
  • 148
  • 7
  • I guess the task ist too heavy for a web app. How long does it take if you run the NLP stuff step by step in an interactive flask session? – Klaus D. Dec 06 '17 at 20:42
  • When the app responds successfully, it is quite fast...less than a second. I haven't tried an interactive Flask session, but with a regular interactive python session spacy.load('en') (which loads the spacy model for English) takes about 1 second. The "doc = nlp(q)" step is very fast. It's not clear to me whether Flask loads the spacy model on every request...if it does, then is it possible that the "route" is being processed before the model is finished loading? – BriWill Dec 06 '17 at 21:38

3 Answers3

1

Try forcing the user of the main Python interpreter context as explained in:

Some third party C extension modules in Python don't work properly in sub interpreters and can hang or crash the process.

Graham Dumpleton
  • 57,726
  • 6
  • 119
  • 134
  • Thanks @graham, I will investigate that. Would this same issue affect the stand-alone Flask server? I'm seeing similar behavior without using Apache or mod-wsgi. – BriWill Dec 06 '17 at 22:45
  • the `WSGIApplicationGroup %{GLOBAL}` directive was already included in my Apache config file. – BriWill Dec 06 '17 at 23:02
  • If you are seeing it with standalone Flask server, then is an issue with your application code or the packages used and nothing to do with running stuff in sub interpreters, which is an issue specific to when running some things under mod_wsgi. – Graham Dumpleton Dec 06 '17 at 23:13
1

This appears to be a known issue in Spacy v2.0. The issue went away after I downgraded to Spacy v1.9.

For more details, see:

https://github.com/explosion/spaCy/issues/1571

and

https://github.com/explosion/spaCy/issues/1572

BriWill
  • 148
  • 7
0

Had the same problem with Django, downgrading to 1.10.0 solved the issue

Anton M.
  • 1
  • 1
  • 1