0

I am trying to do co-reference resolution on a data-set however Stanford's named entity recogniser is unable to properly classify the named entities within my set of text. As such is it possible to give the Stanford co-reference module a set of named entities and the text from a different NER such as NLTK as from my research so far it seems like you cannot split the pipeline up when it does coref? Ideally I would be able to use stanfords NER and then update it using my named entities from another NER before passing it to the coref module. Any help would be greatly appreciated.

I am doing this all in Python currently so I have tested a variety of Python wrappers for stanfordcoreNLP all of which seem to only have the catch all option of annotation to do coref thus making it not possible to achieve what I need. I also looked through the coreNLP documentation and could not find a clear answer as to whether, even in Java or using the server, this would be possible.

Armali
  • 18,255
  • 14
  • 57
  • 171
Cooke1007
  • 73
  • 5

1 Answers1

0

I used the Additional TokensRegexNER Rules file from Stanford CoreNLP's "Named Entity Recognition" (ner) Annotator. Basically, you construct a tab-delimited file with your set of named entities.

https://stanfordnlp.github.io/CoreNLP/ner.html#additional-tokensregexner-rules

I wasn't able to do this from within nltk, but I used nltk's same Stanford CoreNLP java pipeline to pass the rules file into the CoreNLP jar (via the -ner.additional.regexner.mapping option). I went into a bit more depth in my answer here: How to feed CoreNLP some pre-labeled Named Entities?. I imagine it would be straightforward to build an object into nltk that supports this feature, as most CoreNLP-handling in nltk is implemented as derived objects that simply construct with a set of CoreNLP option switches.