I am trying to get python to match a text pattern in pandas dataframe.
What i am doing is
list = ['sarcasm','irony','humor']
pattern = '|'.join(list)
pattern2 = str("( " + pattern.strip().lstrip().rstrip() + " )").strip().lstrip().rstrip()
frame = pd.DataFrame(docs_list, columns=['words'])
# docs_list is the list containing the snippets
#Skipping the inbetween steps for the simplicity of viewing
cp2 = frame.words.str.extract(pattern2)
c2 = cp2.to_frame().fillna("No Matching Word Found")
Which gives an output like this
Snips pattern_found matching_Word
A different type of humor True humor
A different type of sarcasm True sarcasm
A different type of humor and irony True humor
A different type of reason False NA
A type of humor and sarcasm True humor
A type of comedy False NA
So, python checks for the pattern and gives the corresponding output.
Now, here is my problem. As per my understanding, as long as python does not encounter a word from the pattern in the snippet, it keeps on checking for the entire pattern. As soon as it encounters a part of the pattern, it takes that part and skips the remaining words.
How do i make python to look for every word rather than just the first matching word, in order that it outputs like thus?
Snips pattern_found matching_Word
A different type of humor True humor
A different type of sarcasm True sarcasm
A different type of humor and irony True humor
A different type of humor and irony True irony
A different type of reason False NA
A type of humor and sarcasm True humor
A type of humor and sarcasm True sarcasm
A type of comedy False NA
A simple solution would obviously be to put the pattern in a list and iterate over a for loop by checking for every word in every snippet. But time is a constraint. especially because the data set i am dealing with is huge and the snips are fairly long.