I've been trying to make a specific pattern for a spacy matcher using Verbs tenses and moods.
I found out how to access morphological features of words parsed with spacy using model.vocab.morphology.tag_map[token.tag_], which prints out something like this when the verb is in subjunctive mode (the mode I am interested in):
{'Mood_sub': True, 'Number_sing': True, 'Person_three': True, 'Tense_pres': True, 'VerbForm_fin': True, 74: 100}
however, I would like to have a pattern like this one to retokenize specific verb phrases: pattern = [{'TAG':'Mood_sub'}, {'TAG':'VerbForm_ger'}]
In the case of a spanish phrase like: 'Que siga aprendiendo', 'siga' has 'Mood_sub' = True in its tag, and 'aprendiendo' has 'VerbForm_ger' = True in its tag. However, the matcher is not detecting this match.
Can anyone tell me why this is and how I could fix it? This is the code I am using:
model = spacy.load('es_core_news_md')
text = 'Que siga aprendiendo de sus alumnos'
doc = model(text)
pattern = [{'TAG':'Mood_sub'}, {'TAG':'VerbForm_ger'}]
matcher.add(1, None, pattern)
matches = matcher(doc)
for i, start, end in matches:
span = doc[start:end]
if len(span) > 0:
with doc.retokenize() as retokenizer:
retokenizer.merge(span)