How to use Stanford NLP tools with NLTK in Heroku?

Question

I am developing a chatbot (for Kik messenger) with Python and recently moved my app to Heroku, pretty much as described in this question. Additionally, I have included NLTK (a Python module) and some of its resources as described in the Heroku documentation. Up to this point, things work nicely and the chatbot app responds in the Kik messenger.

As a next step, I want to include tools from Stanford NLP with their NLTK API. The Stanford NLP tools are provided as a Java repository, together with several model files. Locally, I have done this after setting up the API according to this answer. I don't know how to do this for Heroku, though. Heroku has a documentation on how to deploy executable jar files, but I don't see how to apply it to my problem.

The actual function I want to use is the Stanford parser that I invoke locally with:

from nltk.parse.stanford import StanfordParser
parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")

This is my first question on SO, so please let me know if and how I can edit this question so that it becomes easier to answer.

Edit: On a more general level, I have a Python application that I run on the Heroku cloud service (with ephemeral file system) and want to include a Java repository.

score 0 · Answer 1 · answered Jun 29 '17 at 13:44

0

You'll need to include the JAR files in your app by downloading them at build time. It sounds from the answer you linked to that you can do this with something like:

import nltk
nltk.download()

You'll also need to add the JVM buildpack to your app:

$ heroku buildpacks:add heroku/jvm

answered Jun 29 '17 at 13:44

codefinger

10,088
7
39
51

As pointed out in [this anser](https://stackoverflow.com/a/34112695/561423), downloading Stanford NLP tools with nltk.download() is no longer possible - Even if it still was, the parser's zipped source directory alone is about 390MB, exceeding Heroku's maximum slug size. But I could create a repo that contains both my app's python source code and the Stanford NLP tool's java source code (with a reduced model file, so that it fits into a slug), and run both the JVM and the python buildpack on it. Do I have to include the NLP tools in my Procfile (maybe as a worker?), and if so, how? – Florian Hollandt Jul 01 '17 at 05:03
Update: I created a repo with both the python app's and the NLP parser's source code, and reduced the size of the model file to a mere 150 MB by deleting all non-english model files - Now I can build a slug using both the python and the JVM buildpack. Upon booting the app, I get an error message from NLTK that the models can't be found, even though I set the CLASSPATH config var to "/app/stanford-parser.jar:/app/stanford-parser-3.7.0-models.jar" and checked in the dyno terminal whether java could be called (with "java -version"). Any ideas, please? :) – Florian Hollandt Jul 01 '17 at 10:57

score 0 · Answer 2 · answered Nov 08 '17 at 04:28

In my case, worked with deleting unnecessary class files in model.jar file. use this code in the stanford-parser directory and make jar file less than 100 MB that is limit per push of github.

jar tf stanford-parser-3.6.0-models.jar

and delete unnecessary class files with this command

zip -d stanford-parser-3.6.0-models.jar edu/stanford/path/to/file

and push your files to github and deploy to your app.

How to use Stanford NLP tools with NLTK in Heroku?

2 Answers2