2

I'm trying to find keywords within a sentence, where the keywords are usually single words, but can be multi-word combos (like "cost in euros"). So if I have a sentence like cost in euros of bacon it would find cost in euros in that sentence and return true.

For this, I was using this code:

if any(phrase in line for phrase in keyword['aliases']:

where line is the input and aliases is an array of phrases that match a keyword (like for cost in euros, it's ['cost in euros', 'euros', 'euro cost']).

However, I noticed that it was also triggering on word parts. For example, I had a match phrase of y and a sentence of trippy cake. I'd not expect this to return true, but it does, since it apparently finds the y in trippy. How do I get this to only check whole words? Originally I was doing this keyword search with a list of words (essentially doing line.split() and checking those), but that doesn't work for multi-word keyword aliases.

IronWaffleMan
  • 2,513
  • 5
  • 30
  • 59

1 Answers1

2

This should accomplish what you're looking for:

import re

aliases = [
    'cost.',
    '.cost',
    '.cost.',
    'cost in euros of bacon',
    'rocking euros today',
    'there is a cost inherent to bacon',
    'europe has cost in place',
    'there is a cost.',
    'I was accosted.',
    'dealing with euro costing is painful']
phrases = ['cost in euros', 'euros', 'euro cost', 'cost']

matched = list(set([
    alias
    for alias in aliases
    for phrase in phrases
    if re.search(r'\b{}\b'.format(phrase), alias)
    ]))

print(matched)

Output:

['there is a cost inherent to bacon', '.cost.', 'rocking euros today', 'there is a cost.', 'cost in euros of bacon', 'europe has cost in place', 'cost.', '.cost']

Basically, we're grabbing all matches, using pythons re module as our test, including cases where multiple phrases occur in a given alias, using a compound list comprehension, then using set() to trim duplicates from the list, then using list() to coerce the set back into a list.

Refs:

Lists: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists

List comprehensions: https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

Sets: https://docs.python.org/3/tutorial/datastructures.html#sets

re (or regex): https://docs.python.org/3/library/re.html#module-re

Chris Larson
  • 1,684
  • 1
  • 11
  • 19
  • Almost, but I've found it doesn't quite work when the keyword is at the end of a sentence and therefore has no space at the end. – IronWaffleMan Feb 04 '19 at 01:54
  • Update my answer. This should cover your use cases. – Chris Larson Feb 04 '19 at 04:06
  • 1
    @IronWaffleMan Note that I kinda overexplained in my answer here. Not meaning to imply your don't get it. Just trying to make the answer more useful for future folks looking for answers. – Chris Larson Feb 04 '19 at 04:25