How to run multiple classifiers with Stanford NER?

Question

I'd like to run one of the built-in classifiers on a file, then run my own classifier, merging the results.

How do I do so with Stanford NER, in particular, via the command line?

I am aware of How do I include more than one classifiers when using Stanford named entity recogniser? , but this is slightly different, as that questions asks about multiple classifiers with NERServer.

Looks like I need to use CoreNLP to run multiple NER models in sequence...can I do it without CoreNLP?

Say I had a file with contents "the quick brown fox jumped over the lazy dog in America". I run the one of the built-in classifiers, and it finds "America" as a location, then I run my own, and it finds "fox" and "dog", the result should be:

the quick brown <animal>fox</animal> jumped over the lazy <animal>dog</animal> in <location>America</location

I can't see how this would run from the command line, but if I did it in code I'd probably set up two pipelines, each configured with their own models; create a duplicate set of keys for each model, since they use the same keys by default; run both pipelines on the text samples; read each list of results (´´CoreMap´´s); and then create a new CoreMap for each result. Although, I don't see the benefit of a single data structure above two separate data structures. — Jonny, Feb 18 '14 at 17:25
You want a command line tool to a) Run Stanford NER on some text, b) Run another NER on the same text and c) Somehow merge the two Is that correct? You might be able to accomplish this with `tee` or similar, but I don't quite understand the 'merging the results' bit. It could be a situation for some glue code. — a p, Feb 18 '14 at 23:55
@ap the problem is if you run one classifier after another, the xml gets all wonky, so it needs to merge intelligently. thx — Neil McGuigan, Feb 18 '14 at 23:58
@NeilMcGuigan I get that- this is what I mean when I say that the 'merge' bit is confusing to me. What sort of output are you expecting? — a p, Feb 19 '14 at 00:11

score 0 · Accepted Answer · answered Feb 19 '14 at 00:57

So, a place to get started if you're dead set on doing this in a single command from the command line:

cat corpus.txt | tee `stanfordNER -options here > out1.xml` | myNERTagger -options here > out2.xml && diff out1.xml out2.xml | awk to do whatever merging you want here...

But what you'll likely find is that this is not a solution. You're going to want to go sentence-by-sentence in a little script, calling pyner or similar to hook into the Stanford tagger and then whatever custom tagger you've built, merging the differences as you go along. The output formatting of your taggers will change how this looks pretty dramatically.

How to run multiple classifiers with Stanford NER?

1 Answers1