1

I have trained a simple NER pipeline using spacy 3.0. After training I want to get a list of predicted IOB tags, among other things from a Doc (doc = nlp(text)). For example, ["O", "O", "B", "I", "O"]

I can easily get the IOB ids (integers) using

>> doc.to_array("ENT_IOB")
array([2, 2, ..., 2], dtype=uint64)

But how can I get the mappings/lookup?

I didn't find any lookup tables in doc.vocab.lookups.tables.

I also understand that I can achieve the same effect by accessing the ent_iob_ at each token ([token.ent_iob_ for token in doc]), but I was wondering if there is a better way?

Murilo Cunha
  • 436
  • 5
  • 9

1 Answers1

1

Check the token documentation:

  • ent_iob IOB code of named entity tag. 3 means the token begins an entity, 2 means it is outside an entity, 1 means it is inside an entity, and 0 means no entity tag is set.
  • ent_iob_ IOB code of named entity tag. “B” means the token begins an entity, “I” means it is inside an entity, “O” means it is outside an entity, and "" means no entity tag is set.

So, all you need is to map the ids to the names using a simple iob_map = {0: "", 1: "I", 2: "O", 3: "B"} dictionary replacement:

doc = nlp("John went to New York in 2010.")
print([x.text for x in doc.ents])
# => ['John', 'New York', '2010']
iob_map = {0: "", 1: "I", 2: "O", 3: "B"}
print(list(map(iob_map.get, doc.to_array("ENT_IOB").tolist())))
# => ['B', 'O', 'O', 'B', 'I', 'O', 'B', 'O']
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks for your answer! In this example you are manually building the dictionary/mapping. Is there a way to extract that from spaCy? I understand that in this example it's quite simple, but I actually want to extract `ent_type` to `ent_type_` mapping as well, and I'd have to actually go through all the tokens in all documents right? – Murilo Cunha Feb 08 '21 at 08:21
  • @MuriloCunha There is no such mapping since it is that simple. `iob_map = {0: "", 1: "I", 2: "O", 3: "B"}`. Else, use your approach (that I erroneously used in my [first answer version](https://stackoverflow.com/revisions/66093034/1)). – Wiktor Stribiżew Feb 08 '21 at 09:39