I am trying to select segments/ clauses of sentences, based on word pairs with which the segments should start. For example, I am interested in sentence segments that start with "what does" or "what is', etc.
To do this, I am looping over two DataFrames, using an if statement
inside a for loop
as shown below. The first DataFrame df1['Sentence']
contains the sentences. The other df2['First2']
contains the pairs of starting words. However, the function seems to loop only over the first word pair in the for loop
- after the first item, it does not return to the for loop. My code seems to work when I would pass lists to it, but not when I pass DataFrames. I have tried the solutions mentioned in Pythonic way to combine FOR loop and IF statement. But they do not work for my DataFrame. I would love to know how to solve this.
DataFrames:
'Sentence' 'First2'
0 If this is a string what does it say? 0 what does
1 And this is a string, should it say more? 1 should it
2 This is yet another string. 2
My code looks as follows:
import pandas as pd
a = df1['Sentence']
b = df2['First2']
#The function seems to loop over all r's but not over all b's:
def func(r):
for i in b:
if i in r:
# The following line selects the sentence segment that starts with
# the words in `First2`, up to the end of the sentence.
q = r[r.index(i):]
return q
else:
return ''
df1['Clauses'] = a.apply(func)
This is the result:
what does it say?
This is correct but incomplete. The code seems to loop over all r
's but not over all b
's. How to get the desired result, as below?
what does it say?
should it say more?