0

I'm currently looping through some files (working great) and trying to figure out how to grab the indices of something and see if the word it matches can be found in a provided list.

For example:

I have the following in one of the files:

MYLIST['APPLE'] = 'Granny-Smith'
SOMETHINGELSE['BUILDING'] = 'Tall'
ANOTHERTHING['SPELLING'] = 'bad'
ADDITIONALLY['BERRY'] = 'Rasp'

I have a list of things I am trying to match on:

keywords = ['apple', 'berry', 'grape']

If I use this regex it will find indices okay (but finds them all):

\[(.*?)\]

But I'm trying to expand that regex so it will only find the ones that exist in the list of keywords.

What else do I need to add to the regex in order to accomplish this?

Hanny
  • 2,078
  • 6
  • 24
  • 52

2 Answers2

1

If you have several words, you can use only regexps, but if you have a large amount of words, it is more reasonable to combine regexps and normal searches:

import re

data = [
    "MYLIST['APPLE'] = 'Granny-Smith'",
    "SOMETHINGELSE['BUILDING'] = 'Tall'",
    "ANOTHERTHING['SPELLING'] = 'bad'",
    "ADDITIONALLY['BERRY'] = 'Rasp'"
]

REGEX = re.compile(r"\['(?P<word>.*?)'\]")
words = ['apple', 'berry', 'grape']

for line in data:
    found = REGEX.search(line)
    if found:
        word = found.group('word').lower()
        if word in words:
            print('FOUND: ', word)

will print:

FOUND:  apple
FOUND:  berry

This technique is also better because the regexp is much simplier and more readable so it is easier to debug and modify this code.

vurmux
  • 9,420
  • 3
  • 25
  • 45
1

If you want to only use regex, well you could use:

keywords = ['apple', 'berry', 'grape']
regex = "\[({})\]".format("|".join(keywords))

I'll leave the upper/lower cases to you.

Got the idea from here how to do re.compile() with a list in python, so upvote for that.

tglaria
  • 5,678
  • 2
  • 13
  • 17