0

i'm looping through strings in the list "titles" and i want to print the string which word matches in "keywords_to_match":

# import re
titles = ['walk to new zealand' , 'fly to mars' , 'drive to murica']
keywords_to_match = re.compile(r'(new zealand)?(mars)?(murica)')
for title in titles:
    # if any of words inside keywords_to_match are matched print title
    if keywords_to_match.search(title.lower()):
        print(title)
        # only prints "drive to murica"

this only prints "drive to murica" but i expect it to print all 3 of the strings inside "titles".

SyedAfaq
  • 25
  • 7

3 Answers3

2

Change your regex to:

keywords_to_match = re.compile(r'\b(?:new zealand|mars|murica)\b')

I'm not sure you need a regex in your case. You can simply do:

titles = ['walk to new zealand', 'fly to mars', 'drive to murica']
[t for t in titles if any(k in t for k in keywords)]
Maroun
  • 94,125
  • 30
  • 188
  • 241
  • 2
    This doesn't really match words, it matches strings. So it would give a false positive for "fly to marseille". A more robust solution would be `r'\b(?:new zealand|mars|murica)\b'`. – ekhumoro Aug 11 '20 at 12:09
  • @ekhumoro Good point, thanks. – Maroun Aug 11 '20 at 12:14
  • `'|'.join(re.escape(k) for k in keywords)` to build the regex fragment from a list – OrangeDog Aug 11 '20 at 12:16
  • @ekhumoro could you elaborate i don't get why we have to use "?:()" with words that have space between them also why the "\b". – SyedAfaq Aug 11 '20 at 12:24
  • 1
    @SyedAfaq The `()` just groups the elements together, and the `?:` means that it doesn't capture anything. The `\b` matches a word boundary, which ensures that only *whole words* are matched. An equivalent pattern would be `r'\bnew zealand\b|\bmars\b|\bmurica\b'`. – ekhumoro Aug 11 '20 at 12:30
  • 1
    @SyedAfaq All the explanations are in [the documentation](https://docs.python.org/3/library/re.html) – OrangeDog Aug 11 '20 at 12:36
1

This also works

keywords_to_match = re.compile(r'(new zealand|mars|murica)')

Just for fun
  • 4,102
  • 1
  • 5
  • 11
1

Use '|' in place of '?' to express OR relationships.

https://docs.python.org/3/library/re.html

caldweln
  • 126
  • 3