Using regex on spaCy PhraseMatcher

Question

I have the following code, trying to match if there is a speed using PhraseMatcher, eg., "44 mph" in a sentence.

import spacy
from spacy.matcher import PhraseMatcher
import re

nlp = spacy.load('en_core_web_sm')
speed_flag = lambda text: bool(re.search(r'(?i)\d+\s?mph', text))
IS_SPEED = nlp.vocab.add_flag(speed_flag)

matcher = PhraseMatcher(nlp.vocab)
matcher.add('MPH', None, [{IS_SPEED: True}])

doc = nlp(u'Car was going 44 mpH.')
matches = matcher(doc)

print(matches)
for match_id, start, end in matches:
    span = doc[start:end]
    print(span.text)

This returns an empty list, however, re.compile(r'([M-m][P-p][H-h])') returns the right answer for "Mph", "mpH", "mPh", etc. and re.compile(r'([0-9]+)') returns any digits in my document.

I am using the example here to construct this: linguistic-features#regex... also, I tested my regex pattern, ([0-9]+) ?([M-m][P-p][H-h]) in a python interpreter and it does work.

I realize the original example is done using Matcher and I am trying to use PhraseMatcher, which does not accept the correct input (viz., a list of dictionaries) to do this.

Any idea as to how to achieve this.

Replace `re.compile(r'([0-9]+) ?([M-m][P-p][H-h])').match(text)` with `re.search(r'(?i)\d+\s?mph', text)` — Wiktor Stribiżew, Aug 24 '18 at 21:24
This problem is specific to spaCy, not to how I search for the regex pattern. I tried your pattern and it also produces an empty list. — horcle_buzz, Aug 24 '18 at 21:27
Yes, I just tried it and it exhibits the same exact behavior as noted above. — horcle_buzz, Aug 24 '18 at 21:29
I still think the `.match` must be replaced with `.search`. And you seem to be using `Matcher` (that operates on tokens) and not `PhraseMatcher` (that operates on raw text) and `44 mph` are two tokens, not one. Hence, no match. — Wiktor Stribiżew, Aug 24 '18 at 21:33
Good catch on the erroneous use of `Matcher`! Oops... (I will correct this later, thanks!) — horcle_buzz, Aug 24 '18 at 21:57
I guess the question is how to use regex on `PhraseMatcher.` Looks like it's not possible. — horcle_buzz, Aug 25 '18 at 00:43
I have corrected the question so it is not longer a duplicate of anything. I do have working solution, as per https://github.com/explosion/spaCy/issues/1567#issuecomment-356589346 ... I would like to post this as an answer, please. NB: Again, I do not think this is a duplicated question. — horcle_buzz, Aug 25 '18 at 17:00
Ach! So it is... I even stumbled across this answer earlier in my searches, but skimmed over it too fast. Anyway, it's all good! — horcle_buzz, Aug 26 '18 at 15:26

Using regex on spaCy PhraseMatcher

0 Answers0