Is possible to keep spacy in memory to reduce the load time?

Question

I want to use spacy as for NLP for an online service. Each time a user makes a request I call the script "my_script.py"

which starts with:

from spacy.en import English
nlp = English()

The problem I'm having is that those two lines take over 10 seconds, is it possible to keep English() in the ram or some other option to reduce this load time to less than a second?

You don't provide enough context. This question is more about the design of your online service than spacy, so please elaborate the former. — Leon, Apr 24 '17 at 11:37
Show your code _"calling the script with the text I want to process as parameter"_, even better make a **MCVe**. Read about Read, How to create a Minimal, Complete, and Verifiable example: https://stackoverflow.com/help/mcve — stovfl, Apr 25 '17 at 07:36

score 10 · Answer 1 · edited May 23 '17 at 12:10

You said that you want to launch a freestanding script (my_script.py) whenever a request comes in. This will use capabilites from spacy.en without the overhead of loading spacy.en. With this approach, the operating system will always create a new process when you launch your script. So there is only one way to avoid loading spacy.en each time: have a separate process that is already running, with spacy.en loaded, and have your script communicate with that process. The code below shows a way to do that. However, as others have said, you will probably benefit by changing your server architecture so spacy.en is loaded within your web server (e.g., using a Python-based web server).

The most common form of inter-process communication is via TCP/IP sockets. The code below implements a small server which keeps spacy.en loaded and processes requests from the client. It also has a client which transmits requests to that server and receives results back. It's up to you to decide what to put into those transmissions.

There is also a third script. Since both client and server need send and receive functions, those functions are in a shared script called comm.py. (Note that the client and server each load a separate copy of comm.py; they do not communicate through a single module loaded into shared memory.)

I assume both scripts are run on the same machine. If not, you will need to put a copy of comm.py on both machines and change comm.server_host to the machine name or IP address for the server.

Run nlp_server.py as a background process (or just in a different terminal window for testing). This waits for requests, processes them and sends the results back:

import comm
import socket
from spacy.en import English
nlp = English()

def process_connection(sock):
    print "processing transmission from client..."
    # receive data from the client
    data = comm.receive_data(sock)
    # do something with the data
    result = {"data received": data}
    # send the result back to the client
    comm.send_data(result, sock)
    # close the socket with this particular client
    sock.close()
    print "finished processing transmission from client..."

server_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# open socket even if it was used recently (e.g., server restart)
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((comm.server_host, comm.server_port))
# queue up to 5 connections
server_sock.listen(5)
print "listening on port {}...".format(comm.server_port)
try:
    while True:
        # accept connections from clients
        (client_sock, address) = server_sock.accept()
        # process this connection 
        # (this could be launched in a separate thread or process)
        process_connection(client_sock)
except KeyboardInterrupt:
    print "Server process terminated."
finally:
    server_sock.close()

Load my_script.py as a quick-running script to request a result from the nlp server (e.g., python my_script.py here are some arguments):

import socket, sys
import comm

# data can be whatever you want (even just sys.argv)
data = sys.argv

print "sending to server:"
print data

# send data to the server and receive a result
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# disable Nagle algorithm (probably only needed over a network) 
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, True)
sock.connect((comm.server_host, comm.server_port))
comm.send_data(data, sock)
result = comm.receive_data(sock)
sock.close()

# do something with the result...
print "result from server:"
print result

comm.py contains code that is used by both the client and server:

import sys, struct
import cPickle as pickle

# pick a port that is not used by any other process
server_port = 17001
server_host = '127.0.0.1' # localhost
message_size = 8192
# code to use with struct.pack to convert transmission size (int) 
# to a byte string
header_pack_code = '>I'
# number of bytes used to represent size of each transmission
# (corresponds to header_pack_code)
header_size = 4  

def send_data(data_object, sock):
    # serialize the data so it can be sent through a socket
    data_string = pickle.dumps(data_object, -1)
    data_len = len(data_string)
    # send a header showing the length, packed into 4 bytes
    sock.sendall(struct.pack(header_pack_code, data_len))
    # send the data
    sock.sendall(data_string)

def receive_data(sock):
    """ Receive a transmission via a socket, and convert it back into a binary object. """
    # This runs as a loop because the message may be broken into arbitrary-size chunks.
    # This assumes each transmission starts with a 4-byte binary header showing the size of the transmission.
    # See https://docs.python.org/3/howto/sockets.html
    # and http://code.activestate.com/recipes/408859-socketrecv-three-ways-to-turn-it-into-recvall/

    header_data = ''
    header_done = False
    # set dummy values to start the loop
    received_len = 0
    transmission_size = sys.maxint

    while received_len < transmission_size:
        sock_data = sock.recv(message_size)
        if not header_done:
            # still receiving header info
            header_data += sock_data
            if len(header_data) >= header_size:
                header_done = True
                # split the already-received data between header and body
                messages = [header_data[header_size:]]
                received_len = len(messages[0])
                header_data = header_data[:header_size]
                # find actual size of transmission
                transmission_size = struct.unpack(header_pack_code, header_data)[0]
        else:
            # already receiving data
            received_len += len(sock_data)
            messages.append(sock_data)

    # combine messages into a single string
    data_string = ''.join(messages)
    # convert to an object
    data_object = pickle.loads(data_string)
    return data_object

Note: you should make sure the result sent from the server only uses native data structures (dicts, lists, strings, etc.). If the result includes an object defined in spacy.en, then the client will automatically import spacy.en when it unpacks the result, in order to provide the object's methods.

This setup is very similar to the HTTP protocol (server waits for connections, client connects, client sends a request, server sends a response, both sides disconnect). So you might do better to use a standard HTTP server and client instead of this custom code. That would be a "RESTful API", which is a popular term these days (with good reason). Using standard HTTP packages would save you the trouble of managing your own client/server code, and you might even be able to call your data-processing server directly from your existing web server instead of launching my_script.py. However, you will have to translate your request into something compatible with HTTP, e.g., a GET or POST request, or maybe just a specially formatted URL.

Another option would be to use a standard interprocess communication package such as PyZMQ, redis, mpi4py or maybe zmq_object_exchanger. See this question for some ideas: Efficient Python IPC

Or you may be able to save a copy of the spacy.en object on disk using the dill package (https://pypi.python.org/pypi/dill) and then restore it at the start of my_script.py. That may be faster than importing/reconstructing it each time and simpler than using interprocess communication.

Nice response matthias, This is the way to do it. RAM is inherently volatile and process-centric, therefore a single process can act as a proxy for your requests and remove the overhead of loadtime. — Nathan McCoy, May 02 '17 at 08:41

score 4 · Answer 2 · answered Apr 23 '17 at 18:22

4

Your target should be to initialize the spacy models only once. Use a class , and make spacy a class attribute. Whenever you would use it, it would be the same instance of the attribute.

from spacy.en import English

class Spacy():
      nlp = English()

answered Apr 23 '17 at 18:22

DhruvPathak

42,059
16
116
175

I'm calling the script with the text I want to process as parameter, what can I do to keep in the background waiting for an input? I think I'll have the same issue here. – Luis Ramon Ramirez Rodriguez Apr 23 '17 at 19:33
1

@LuisRamonRamirezRodriguez That is not an ideal way to do it. A suggested alternate would be to either have spacy running in a uwsgi server like gunicorn/uwsgi and talk over rest apis. Or you can have the spacy python process running as a worker for celery, to which you can push sync tasks and get sync responses. – DhruvPathak May 01 '17 at 06:14

score 2 · Answer 3 · answered Apr 26 '17 at 11:12

So here is a hack to do this ( I personally would refactor my code and not do this but since your requirement does not elaborate much i am going to suggest this-)

You must have a daemon which runs the online service. Import spacy in the daemon and pass it as a parameter to the file that does the nlp stuff.

I would refactor my code to use a class as mentioned in the solution by @dhruv which is much cleaner.

The following example is a rough sketch of how to go about things. (Very bad programming principle though.)

File1.py

def caller(a,np):
    return np.array(a)

File2.py

import numpy as np 
from File1 import caller

z=caller(10,np)
print z

The above method will have a load time for the very first time the daemon is started , after that it's just a function call. Hope this helps!

score 2 · Answer 4 · answered Apr 26 '17 at 22:22

Your fundamental problem here is launching a new script for every request. Instead of running a script for every request, run a function from within the script on every request.

There are a variety of ways to handle user requests. The simplest is to periodically poll for requests and add them to a queue. The async framework is also useful for this kind of work.

This talk by raymond hettinger is an excellent introduction to concurrency in Python.

score 2 · Answer 5 · answered Apr 27 '17 at 16:14

2

Since you are using Python you can program some sort of workers (I think at some point you will need to scale you application also) where these initialisation are only done once! We have tried Gearman for similar usecase and it works well.

Cheers

answered Apr 27 '17 at 16:14

ML_TN

727
6
16

The principle is easy, your "my_script.py" will fit inside the worker and you will have to program a server which will distribute the workload (client query) over the workers and collect jobs results. A classic master-slaves architecture. – ML_TN Apr 27 '17 at 16:21

Is possible to keep spacy in memory to reduce the load time?

5 Answers5

Linked