3

I have been searching for a while but no luck finding out what NER labels are included in the pretrained NerDL(tensorflow) model. I would think the training data can provide such information, but I do not see it mentioned in any documentation.

downloadable model: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_precise_en_1.7.0_2_1539623388047.zip

Any direction would be appreciated!

UPDATE:

I indeed filed an issue in SparkNLP github following the advice here :) I just heard back from them. Here is the answer:

For practical purposes, the pretrained NER model has

B-ORG

I-ORG

B-PER

I-PER

B-LOC

I-LOC

and it has been trained from: https://raw.githubusercontent.com/patverga/torch-ner-nlp-from-scratch/master/data/conll2003/eng.train

See original issue here.

Community
  • 1
  • 1
ZEE
  • 188
  • 1
  • 12

1 Answers1

2

that model is trained on the CONLL2003 dataset for NER,

http://aclweb.org/anthology/W03-0419

That dataset basically has PERSON, ORGANIZATION, and LOCATION.

hope this helps, Alberto.

AlbertoAndreotti
  • 478
  • 4
  • 13
  • Hi AlbertoAndreotti, is there any documentation mentioned that the `NerDLModel.pretrained()` is trained using the dataset discussed in this paper? I could not find it mentioned anywhere, just wanted to be sure. (I did see the `CoNLL 2003 IOB NER file` under the `Named Entity Recognition Deep Learning annotator`, but that does not really confirm that the pre-trained version is from this dataset) – ZEE Dec 06 '18 at 00:24
  • That's true they don't mention it. I suggest you to submit an issue to the github project so that detail gets clarified. – AlbertoAndreotti Dec 06 '18 at 03:47