How to not match whole word "king" to "king?"?

Question

How do I verify an exact word occurs in a string?

I need to account for cases when a word such as "king" has a question mark immediately following as in the example below.

unigrams this should be False

In [1]: answer = "king"
In [2]: context = "we run with the king? on sunday"

n_grams this should be False

In [1]: answer = "king tut"
In [2]: context = "we run with the king tut? on sunday"

unigrams this should be True

In [1]: answer = "king"
In [2]: context = "we run with the king on sunday"

n_grams this should be True

In [1]: answer = "king tut"
In [2]: context = "we run with the king tut on sunday"

As people mentioned, for the unigram case we can handle it by splitting the string into a list, but that doesn't work for n_grams.

After reading some posts, I think I should attempt to handle using a look behind, but I'm not sure.

All the problem is that you didn't clearly define what you call an *"exact word"*. — Casimir et Hippolyte, Mar 21 '17 at 22:41
I wasn't thinking about a better theoretical definition, but a more concrete description of what is allowed before and after your needle. Nevertheless I praise your effort. — Casimir et Hippolyte, Mar 21 '17 at 22:46
I think you should not have removed your attempt from the question. — Wiktor Stribiżew, Mar 21 '17 at 22:59
The edit made a huge difference, but I think you still need to be more specific. For example, what should happen when `context` is `'king.'`? Ultimately, it sounds like regex is probably the right tool, but so far I doubt you need to employ anything more "exotic" than an exclusion set (such as `[^?]`). — John Y, Mar 21 '17 at 23:05
@mattyd2 I updated my answer, but given the new requirements a regex is probably a better answer... see Wiktor's answer :) — TemporalWolf, Mar 22 '17 at 18:23

TemporalWolf · Answer 1 · 2017-03-22T18:35:30.590

return answer in context.split():

>>> answer in context.split()
False

You don't need a regex for this.

If you're looking for keywords:

all([ans in context.split() for ans in answer.split()])

will work with "king tut", but that depends if you want to match strings like:

"we tut with the king"

If you don't, you still don't need a regex (although you should probably use one), given that you want to consider only whole terms (which are properly split, by default, via .split()):

def ngram_in(match, string):
    matches = match.split()
    if len(matches) == 1:
        return matches[0] in string.split()
    words = string.split()
    words_len = len(words)
    matches_len = len(matches)
    for index, word in enumerate(words):
        if index + matches_len > words_len:
            return False
        if word == matches[0]:
            for match_index, match in enumerate(matches):
                potential_match = True
                if words[index + match_index] != match:
                    potential_match = False
                    break
            if potential_match == True:
                return True
    return False

which is O(n*m) on a worst case string and about half as fast as a regex on normal strings.

>>> ngram_in("king", "was king tut a nice dude?")
True
>>> ngram_in("king", "was king? tut a nice dude?")
False
>>> ngram_in("king tut a", "was king tut a nice dude?")
True
>>> ngram_in("king tut a", "was king tut? a nice dude?")
False
>>> ngram_in("king tut a", "was king tut an nice dude?")
False
>>> ngram_in("king tut", "was king tut an nice dude?")
True

thank you @TemporalWolf. I'm going to update the post to address another case when this won't work. — mattyd2, Mar 21 '17 at 22:35
The hundredth being: **what to do with all this free time**? — Jan, Mar 21 '17 at 22:54

score 4 · Accepted Answer · answered Mar 21 '17 at 22:56

Use a regular expression like this:

reg_answer = re.compile(r"(?<!\S)" + re.escape(answer) + r"(?!\S)")

See the Python demo

Details:

(?<!\S) - a negative lookbehind to ensure a match is preceded with whitespace or start of a string
re.escape(answer) - a preprocessing step to make all special chars inside the search word be treated as literal chars
(?!\S) - a negative lookahead to ensure the match is followed with whitespace or end of string.

score 0 · Answer 3 · edited May 23 '17 at 10:30

0

Why not check:

if answer in context: do stuff

Check this post for more details

edited May 23 '17 at 10:30

Community

1
1

answered Mar 21 '17 at 22:44

pypy

443
5
19

This doesn't satisfy the requirements. OP needs `'king'` to NOT be found in `'king?'`. – John Y Mar 21 '17 at 22:46

How to not match whole word "king" to "king?"?

3 Answers3