I have the following code, trying to match if there is a speed using PhraseMatcher, eg., "44 mph" in a sentence.
import spacy
from spacy.matcher import PhraseMatcher
import re
nlp = spacy.load('en_core_web_sm')
speed_flag = lambda text: bool(re.search(r'(?i)\d+\s?mph', text))
IS_SPEED = nlp.vocab.add_flag(speed_flag)
matcher = PhraseMatcher(nlp.vocab)
matcher.add('MPH', None, [{IS_SPEED: True}])
doc = nlp(u'Car was going 44 mpH.')
matches = matcher(doc)
print(matches)
for match_id, start, end in matches:
span = doc[start:end]
print(span.text)
This returns an empty list, however, re.compile(r'([M-m][P-p][H-h])')
returns the right answer for "Mph", "mpH", "mPh", etc. and re.compile(r'([0-9]+)')
returns any digits in my document.
I am using the example here to construct this: linguistic-features#regex... also, I tested my regex pattern, ([0-9]+) ?([M-m][P-p][H-h])
in a python interpreter and it does work.
I realize the original example is done using Matcher
and I am trying to use PhraseMatcher,
which does not accept the correct input (viz., a list of dictionaries) to do this.
Any idea as to how to achieve this.