-2

What I am looking for is a way to find a keyword in a set of text, then check if that text is a part of a larger phrase. Here's a few examples to illustrate what I mean.

Let's say I'm looking for text that contains the word electric. But I'm NOT looking for general electric. So if the text is:

The atmosphere is electric!

I would like it to return a positive. But if instead it is:

I just got a new job at general electric!

I don't want that to show up. Now normally I would do this using a regular expression for (electric), then doing another regular expression search for general electric if found, return a negative.

HOWEVEVER, this type of text spoils that plan.

I'm at a party for general electric. The atmosphere here is electric!

Because this has an instance of the word electric that is not a part of the larger string, I want it to return a positive. BUT, doing a re search for general electric would make it seem like I should return a negative using my method.

What type of algorithm can I use to solve these issues in Python?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
SSC Fan
  • 123
  • 10
  • 1
    given this limited example, one way to check is to split up the sentence and if the preceding word is a noun/verb, then either exclude/include. – etch_45 Jul 02 '21 at 06:19

1 Answers1

0

You can use lookaround.

import re

re.search(r'(?<!general )\belectric\b', 'The atmosphere is electric!')
# <re.Match object; span=(18, 26), match='electric'>

re.search(r'(?<!general )\belectric\b', 'I just got a new job at general electric!')
# None

re.search(r'(?<!general )\belectric\b', 'I\'m at a party for general electric. The atmosphere here is electric!')
# <re.Match object; span=(60, 68), match='electric'>

(?<! ... ) is negative lookbehind. (?<!general )\belectric\b means general cannot come before word electric. This regex will only match electric that does not have general before that.

You can learn more about lookaround at Regex lookahead, lookbehind and atomic groups

MarkSouls
  • 982
  • 4
  • 13