Load Custom NER Model Stanford CoreNLP

Question

I have created my own NER model with Stanford's "Stanford-NER" software and by following these directions.

I am aware that CoreNLP loads three NER models out of the box in the following order:

edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz
edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz
edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz

I now want to include my NER model in the list above and have the text tagged by my NER model first.

I have found two previous StackOverflow questions regarding this topic and they are 'Stanford OpenIE using customized NER model' and 'Why does Stanford CoreNLP NER-annotator load 3 models by default?'

Both of these posts have good answers. The general message of the answers is that you have to edit code within a file.

Stanford OpenIE using customized NER model

From this post it says to edit corenlpserver.sh but I cannot find this file within the Stanford CoreNLP downloaded software. Can anyone point me to this file's location?

does Stanford CoreNLP NER-annotator load 3 models by default?

This post says that I can use the argument of -ner.model to specifically call which NER models to load. I added this argument to the initial server command (java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -ner.model *modlefilepathhere*). This did not work as the server still loaded all three models.

It also states that you have to change some java code though it does not specifically call out where to make the change.

Do I need to modify or add this code props.put("ner.model", "model_path1,model_path2"); to a specific class file in the CoreNLP software?

QUESTION: From my research it seems that I need to add/modify some code to call my unique NER model. These 'edits' are outlined above and this information has been pulled from other StackOverflow questions. What files specifically do I need to edit? Where exactly are these files located (i.e. edu/Stanford/nlp/...etc)?

EDIT: My system is running on a local server and I'm using the API pycorenlp in order to open a pipeline to my local server and to make requests against it. the two critical lines of python/pycorenlp code are:

nlp = StanfordCoreNLP('http://localhost:9000')
output = nlp.annotate(evalList[line], properties={'annotators': 'ner, openie','outputFormat': 'json', 'openie.triple.strict':'True', 'openie.max_entailments_per_clause':'1'})

I do NOT think this will affect my ability to call my unique NER model but I wanted to present all the situational data I can in order to obtain the best possible answer.

@StanfordNLPHelp looking for a little more clarification on your answer to the listed post above. Thanks!! — Fraizier Reiland, May 12 '17 at 16:29

score 3 · Accepted Answer · edited May 23 '17 at 12:10

If you want to customize the pipeline the server uses, create a file called server.properties (or you can call it whatever you want).

Then add this option when you start the server -serverProperties server.properties with the java command.

In that .properties file you should include ner.model = /path/to/custom_model.ser.gz

In general you can customize the pipeline the server will use in that .properties file. For instance you can also set the list of annotators in it with the line annotators = tokenize,ssplit,pos,lemma,ner,parse etc...

UPDATE to address comments:

In your java command you don't need the -ner.model /path/to/custom_model.ser.gz
A .properties file can have an unlimited amount of properties settings in it, one setting per line (blank lines are ignored, as are #'d out lines)
When you run a Java command, it default looks for files in the directory you are running the command. So if your command includes -serverProperties server.properties it is going to assume that the file server.properties is in the same directory the command is running from. If you supply an absolute path instead -serverProperties /path/to/server.properties you can run the command from anywhere.
So just to be clear you could start the server with this command (run in the folder with all the jars):

java -Xmx8g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -serverProperties server.properties

and server.properties should be a file like this:

ner.model = /path/to/custom_model.ser.gz

server.properties could look like this:

annotators = tokenize,ssplit,pos,lemma,ner,depparse
ner.model = /path/to/custom_model.ser.gz
parse.maxlen = 100

just as an example...you should put all settings into server.properties

I made some comments about accessing the StanfordCoreNLP server from Python in a previous answer:

cannot use pycorenlp for python3.5 through terminal

You appear to be using the pycorenlp library which I don't really know about. 2 other options are some code I show in that answer or the stanza package we make. Details in that answer above.

I am running my own server. Do these instructions still hold true? The java command would look something like this `java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -serverProperties sever.properties -ner.model /path/to/custom_model.ser.gz`, is that correct? Can the properties I create contain ONLY the `ner.model = /path/to/custom_model.ser.gz` line and does it matter where I save that file? @StanfordNLPHelp — Fraizier Reiland, May 15 '17 at 15:32
Also, besides having my own instantiation of CoreNLP I'm using Python to communicate with the server. Using the line `nlp = StanfordCoreNLP('http://localhost:80')` to open a pipeline to the server and `output = nlp.annotate('string', properties={'annotators'='ner', openie', 'outputFormat': 'json'})` to make calls to the server. Can I edit the second line here in order to specify which 'ner' I want to use? @StanfordNLPHelp — Fraizier Reiland, May 15 '17 at 15:52
I would recommend just setting the pipeline properties once when you start up your server and not sending pipeline properties over the request. That functionality is still a bit in flux. But if you just start the server with a list of properties and then send requests with just the text you'll get back json with the responses. — StanfordNLPHelp, May 15 '17 at 20:46
**Thank you for adding more to you answer! You have helped a lot.** I am able to get the server to attempt to load my custom model but I am running into this error `edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Couldn't load classifier from /path/to/custom_model.ser.gz`. I'm going to research this issue and maybe open a new question. Again, thank you @StanfordNLPHelp — Fraizier Reiland, May 16 '17 at 14:05
Not sure if you found the answer, but I had the same 'Couldn't load classifier' error and found that I needed to put the full path to the model in the server.properties file. — LVNGD, Feb 17 '19 at 00:24
Is it possible to give path to more than one CustomNER models or CustomNER model along with default models?. Thanks — YoungSheldon, Sep 25 '20 at 07:40

Load Custom NER Model Stanford CoreNLP

1 Answers1

Linked