Regex for finding matches in a list Python

Question

I'm currently looping through some files (working great) and trying to figure out how to grab the indices of something and see if the word it matches can be found in a provided list.

For example:

I have the following in one of the files:

MYLIST['APPLE'] = 'Granny-Smith'
SOMETHINGELSE['BUILDING'] = 'Tall'
ANOTHERTHING['SPELLING'] = 'bad'
ADDITIONALLY['BERRY'] = 'Rasp'

I have a list of things I am trying to match on:

keywords = ['apple', 'berry', 'grape']

If I use this regex it will find indices okay (but finds them all):

\[(.*?)\]

But I'm trying to expand that regex so it will only find the ones that exist in the list of keywords.

What else do I need to add to the regex in order to accomplish this?

score 1 · Accepted Answer · answered May 20 '19 at 15:55

If you have several words, you can use only regexps, but if you have a large amount of words, it is more reasonable to combine regexps and normal searches:

import re

data = [
    "MYLIST['APPLE'] = 'Granny-Smith'",
    "SOMETHINGELSE['BUILDING'] = 'Tall'",
    "ANOTHERTHING['SPELLING'] = 'bad'",
    "ADDITIONALLY['BERRY'] = 'Rasp'"
]

REGEX = re.compile(r"\['(?P<word>.*?)'\]")
words = ['apple', 'berry', 'grape']

for line in data:
    found = REGEX.search(line)
    if found:
        word = found.group('word').lower()
        if word in words:
            print('FOUND: ', word)

will print:

FOUND:  apple
FOUND:  berry

This technique is also better because the regexp is much simplier and more readable so it is easier to debug and modify this code.

score 1 · Answer 2 · answered May 20 '19 at 16:05

If you want to only use regex, well you could use:

keywords = ['apple', 'berry', 'grape']
regex = "\[({})\]".format("|".join(keywords))

I'll leave the upper/lower cases to you.

Got the idea from here how to do re.compile() with a list in python, so upvote for that.

Regex for finding matches in a list Python

2 Answers2