1

Questions:

  1. How can I give GoldParse the gold data for a custom attribute?
  2. How can I extend the properties of Scorer by custom scores which are based on a custom attribute?

Explanation

I have implemented a custom pipeline component setting a custom attribute which was set with Doc.set_extension('results', default=[]). I want to evaluate my pipeline with labelled data (something like {text: "This is some text", results: ["banana", "picture"]}). It seems to me that GoldParse and Scorer are doing what I need with default attributes, but I can't find information on how to use them with a custom attribute.

I have seen and understood examples like this, but they only ever deal with default attributes.

What I've tried

  • I have tried figuring out whether I can somehow configure the two classes for custom attributes/scores, but haven't found a way. The parameters of the init method of GoldParse and the Scorer properties seem to be fixed.
  • I have thought about extending the two classes with subclasses, but they don't seem easily extendable to me.

What I would like to avoid

Of course I can copy the code from Scorer and GoldParse which I need and add code for my custom attribute, but that seems like a bad solution. Also, considering how spaCy encourages you to extend a pipeline and a Doc, I would be surprised if the evaluation of those extensions were this hard.

iron9
  • 397
  • 2
  • 12

1 Answers1

0

Unfortunately, it actually is this hard in spacy v2. It's very hard to add things to GoldParse (basically a don't-try-this-at-home level of hard) and the Scorer is also hard to extend.

We're working on this for the upcoming spacy v3, where the scoring methods will be implemented more generally and each component will be able to provide its own score method. Be aware that this is still unstable, but if you're curious you can have a look at: https://github.com/explosion/spaCy/pull/5731. GoldParse has been replaced with Example, which stores both the gold annotation and the predicted annotation on individual Doc objects, getting rid of the restrictions related to GoldParse.

If you have a doc-level extension (as above) then you should probably just use a different library for evaluation. You could potentially use ROCAUCScore or PRFScore from spacy.scorer, but it may be easier to use something like sklearn metrics instead. (The ROCAUCScore is just a simplified version of the sklearn ROC AUC metric.)

If you have a token-level extension, for v2 I think the best you can do within spacy is to use PRFScore and extract the alignment bits based on words from a GoldParse to use outside of the scorer itself. Something like this:

import spacy
from spacy.scorer import PRFScore

nlp = spacy.load("my_model")
score = PRFScore()
for text, gold_words, gold_attrs in zip(texts, gold_words_list, gold_attrs_list):
    # NOTE: gold_attrs must be aligned with gold_words
    # gold_words = ["a", "b", "c", ...]
    # gold_attrs = ["a1", "b1", "c1", ...]

    gold = GoldParse(nlp.make_doc(text), words=gold_words)
    doc = nlp(text)

    gold_values = set()
    cand_values = set()
    for i, gold_attr in enumerate(gold_attrs):
        gold_values.add((i, gold_attr))
    for token in doc:
        if token.orth_.isspace():
            continue
        gold_i = gold.cand_to_gold[token.i]
        if gold_i is not None:
            cand_values.add((gold_i, doc._.attr))
    score.score_set(cand_values, gold_values)

print(score.fscore)

This is an untested sketch that should parallel how token.tag is evaluated in the Scorer. The alignment bits are the trickiest part, so if you don't have misalignments between gold words and spacy's tokenization, then you may also be better off exporting your results and using a different library for evaluation.

aab
  • 10,858
  • 22
  • 38