I am trying to capture words following specified stocks in a pandas df. I have several stocks in the format $IBM
and am setting a python regex pattern to search each tweet for 3-5 words following the stock if found.
My df called stock_news
looks as such:
Word Count
0 $IBM 10
1 $GOOGL 8
etc
pattern = ''
for word in stock_news.Word:
pattern += '{} (\w+\s*\S*){3,5}|'.format(re.escape(word))
However my understanding is that {}
should be a quantifier, in my case matching between 3 to 5 times however I receive the following KeyError
:
KeyError: '3,5'
I have also tried using rawstrings with r'{} (\w+\s*\S*){3,5}|'
but to no avail. I also tried using this pattern on regex101 and it seems to work there but not in my Pycharm IDE. Any help would be appreciated.
Code for finding:
pat = re.compile(pattern, re.I)
for i in tweet_df.Tweets:
for x in pat.findall(i):
print(x)