0

I want to train my own custom NER with SpaCy for recogrnizing addresses.

This is my data:

training_data = [('send to: Aargauerstrasse 8005', {'entities': [(9, 28, 'ADDRESS')]}), 
                ('send to: Abeggweg 8057', {'entities': [(9, 21, 'ADDRESS')]}), 
                ('send to: Abendweg 8038', {'entities': [(9, 21, 'ADDRESS')]}), 
                ('send to: Ackermannstrasse 8044', {'entities': [(9, 29, 'ADDRESS')]}), 
                ('send to: Aehrenweg 8050', {'entities': [(9, 22, 'ADDRESS')]}), 
                ('send to: Aemmerliweg 8050', {'entities': [(9, 24, 'ADDRESS')]}), 
                ('send to: Albisgütliweg 8045', {'entities': [(9, 26, 'ADDRESS')]}), 
                ('send to: Albisstrasse 8038', {'entities': [(9, 25, 'ADDRESS')]}), 
                ('send to: Albulastrasse 8048', {'entities': [(9, 26, 'ADDRESS')]}), 
                ('send to: Alderstrasse 8008', {'entities': [(9, 25, 'ADDRESS')]})]

I have followed this tutorial (official tutorial... 20min 30sec): https://www.youtube.com/watch?v=IqOJU1-_Fi0&t=1328s

These are my functions:

# CREATING BLANK MODEL
def create_blank_nlp(train_data):

    nlp = spacy.blank("en") # prazan model
    nlp.add_pipe("transformer")
    nlp.add_pipe("parser")
    
    ner = nlp.create_pipe("ner") # ubaci custom ner
    nlp.add_pipe("ner", last = True)
    ner = nlp.get_pipe("ner")
        
    for _, data in train_data:
        for ent in data.get("entities"):
            ner.add_label(ent[2])

    return nlp

nlp = create_blank_nlp(train_data)
optimizer = nlp.begin_training()

# TRAINING 
for i in range(5):
    
    random.shuffle(train_data)
    
    losses = {}

    sizes = compounding(1.0, 5.0, 150.0)
    batches = minibatch(train_data, size = sizes)
    for batch in batches:
        for text, annotations in batch:
            
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            
            nlp.update([example], drop = 0.2, sgd = optimizer, losses = losses)

        
    print("Lossess at iteration", i, losses)

What should I do?

taga
  • 3,537
  • 13
  • 53
  • 119
  • 1
    You get this error because you're using spacy 3. The `update()` functions expects an `Example` type. Here's how to do this: https://stackoverflow.com/questions/66675261/how-can-i-work-with-example-for-nlp-update-problem-with-spacy3-0/66679910#66679910 – krisograbek Mar 26 '21 at 05:23
  • Thanks, but now Im getting some kind of warning: UserWarning: [W030] Some entities could not be aligned in the text "company name Red Carnation" with entities "[(14, 25, 'COMPANY')]". Use `spacy.training.offsets_to_biluo_tags(nlp.make_doc(text), entities)` to check the alignment. Misaligned entities ('-') will be ignored during training. – taga Mar 26 '21 at 09:57
  • 1
    Please provide the code I could reproduce – krisograbek Mar 26 '21 at 16:29
  • I have updated my question – taga Mar 26 '21 at 16:32
  • also, if you have time, check out this question: https://stackoverflow.com/questions/66821133/creating-rule-based-matching-with-spacy-and-python-for-detecting-addresses – taga Mar 26 '21 at 16:53

1 Answers1

0

nlp.update([example], drop=0.2, sgd=optimizer, losses=losses)

Dhnesh Dhingra
  • 301
  • 1
  • 7