20

I want to combine spaCy's NER engine with a separate NER engine (a BoW model). I'm currently comparing outputs from the two engines, trying to figure out what the optimal combination of the two would be. Both perform decently, but quite often spaCy finds entities that the BoW engine misses, and vice versa. What I would like is to access a probability score (or something similar) from spaCy whenever it finds an entity that is not found by the BoW engine. Can I get spaCy to print out its own probability score for a given entity it has found? As in, "Hi, I'm spaCy. I've found this token (or combination of tokens) that I'm X% certain is an entity of type BLAH." I want to know that number X every time spaCy finds an entity. I imagine there must be such a number somewhere internally in spaCy's NER engine, plus a threshold value below which the possible entity is not flagged as an entity, and I'd like to know how to get my hands on that number. Thanks in advance.

polm23
  • 14,456
  • 7
  • 35
  • 59
Mede
  • 203
  • 2
  • 7

2 Answers2

17

Actually, there is an issue for that.

The author of the library, suggests there (among others) the following solution:

  1. Beam search with global objective. This is the standard solution: use a global objective, so that the parser model is trained to prefer parses that are better overall. Keep N different candidates, and output the best one. This can be used to support confidence by looking at the alternate analyses in the beam. If an entity occurs in every analysis, the NER is more confident it's correct.

Code:

import spacy
import sys
from collections import defaultdict

nlp = spacy.load('en')
text = u'Will Japan join the European Union? If yes, we should \ 
move to United States. Fasten your belts, America we are coming'


with nlp.disable_pipes('ner'):
    doc = nlp(text)

threshold = 0.2
(beams, somethingelse) = nlp.entity.beam_parse([ doc ], beam_width = 16, beam_density = 0.0001)

entity_scores = defaultdict(float)
for beam in beams:
    for score, ents in nlp.entity.moves.get_beam_parses(beam):
        for start, end, label in ents:
            entity_scores[(start, end, label)] += score

print ('Entities and scores (detected with beam search)')
for key in entity_scores:
    start, end, label = key
    score = entity_scores[key]
    if ( score > threshold):
        print ('Label: {}, Text: {}, Score: {}'.format(label, doc[start:end], score))

Sample output:

Entities and scores (detected with beam search)

Label: GPE, Text: Japan, Score: 0.9999999999999997

Label: GPE, Text: America, Score: 0.9991664575947963

Important note: The outputs you will get here are probably different from the outputs you would get using the Standard NER and not the beam search alternative. However, the beam search alternative provides you a metric of confidence that as I understand from your question is useful for your case.

Outputs with Standard NER for this example:

Label: GPE, Text: Japan

Label: ORG, Text: the European Union

Label: GPE, Text: United States

Label: GPE, Text: America

MBT
  • 21,733
  • 19
  • 84
  • 102
gdaras
  • 9,401
  • 2
  • 23
  • 39
  • 1
    Brilliant, this will absolutely come in handy. Thanks very much! – Mede Oct 09 '18 at 06:14
  • 1
    I tested your code and everything works fine. However, there is one odd detail. I get 0.99 score for one entity, that does not show in the doc.ents. Any idea why this is happening? – Miguel Jan 03 '19 at 17:34
  • 2
    Did not directly work for me: I had to change `(beams, somethingelse) = ..` to `beams = ..` – MBT Mar 27 '20 at 14:07
  • 2
    For anyone finding this now: please note this does not work in v3. We are working on adding a span categorizer that lets you get NER confidence in a less hacky way. https://github.com/explosion/spaCy/pull/6747 – polm23 May 07 '21 at 07:31
  • @polm23 I guess you merged the new way of getting confidence. Could you please post another answer? – Mutlu Simsek Aug 04 '21 at 13:51
  • I forgot about this for a while but it came up on the SO sidebar, so I added an answer. – polm23 Jul 25 '22 at 05:58
2

As of spaCy v3 the recommended way to do this is to use the spancat component, described in this blog post. While the main motivation for the spancat architecture is to allow more flexible structures, such as overlapping or nested annotations, it also has a more conventional scoring mechanism that makes it easy to provide confidence values. You can read about how to interpret the confidence values here (note they don't add to 1).

If you have NER training data, it's simple to modify it to use for training a spancat component.

polm23
  • 14,456
  • 7
  • 35
  • 59