I'm trying to train an NER model with SpaCy- v3, and there's this error I'm facing in the Example.from_dict() method. In fact, I had referred answers from this earlier question on how to use the Example class.
Here is the code snippet:
nlp = spacy.blank('en')
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe('ner', last=True)
else:
ner = nlp.get_pipe('ner')
for _, annotations in TRAIN_DATA:
for label in annotations['entities']:
ner.add_label(label[2])
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes):
optimizer = nlp.begin_training()
for epoch in range(EPOCHS):
random.shuffle(TRAIN_DATA)
losses = {}
print(f'Epoch {epoch+1} of {EPOCHS}:')
for text, annotations in TRAIN_DATA:
doc = nlp.make_doc(text)
example = Example.from_dict(doc, annotations)
nlp.update([example], drop=0.2, sgd=optimizer, losses=losses) #SGD
print(losses) #Print losses after each epoch
Above, TRAIN_DATA
is a list with tuples like this:
('El Salvador achieved independence from Spain in 1821 and from the Central American Federation in 1839 .',
{'entities': [(97, 101, 'tim'),
(39, 44, 'org'),
(48, 52, 'tim'),
(66, 93, 'org'),
(0, 11, 'geo')]})
And finally, this is the error traceback:
TypeError Traceback (most recent call last)
/tmp/ipykernel_34/168282795.py in <module>
8 for text, annotations in TRAIN_DATA:
9 doc = nlp.make_doc(text)
---> 10 example = Example.from_dict(doc, annotations)
11 nlp.update([example], drop=0.2, sgd=optimizer, losses=losses) #SGD
12 print(losses) #Print losses after each epoch
/opt/conda/lib/python3.7/site-packages/spacy/training/example.pyx in spacy.training.example.Example.from_dict()
/opt/conda/lib/python3.7/site-packages/spacy/training/example.pyx in spacy.training.example.annotations_to_doc()
/opt/conda/lib/python3.7/site-packages/spacy/training/example.pyx in spacy.training.example._add_entities_to_doc()
/opt/conda/lib/python3.7/site-packages/spacy/training/iob_utils.py in offsets_to_biluo_tags(doc, entities, missing)
102 biluo[starts[s]] = "O"
103 else:
--> 104 for token_index in range(start_char, end_char):
105 if token_index in tokens_in_ents.keys():
106 raise ValueError(
TypeError: 'numpy.float64' object cannot be interpreted as an integer
My first question on Stack Overflow. Hope I've provided all necessary info to be eligible for help. Thanks in advance!