I have a large number of sentences, from which I want to extract sub-sentences that start with certain word combinations. For example, I want to extract sentence segments that begin with "what does" or "what is', etc. (essentially eliminating the words from the sentence that appear before the word-pairs). Both the sentences and the word-pairs are stored in a DataFrame
:
'Sentence' 'First2'
0 If this is a string what does it say? 0 can I
1 And this is a string, should it say more? 1 should it
2 This is yet another string. 2 what does
3 etc. etc. 3 etc. etc
The result I want from the above example would be:
0 what does it say?
1 should it say more?
2
The most obvious solution (at least to me) below does not work. It only uses the first word-pair b
to go over all the sentences r
, but not the other b
's.
a = df['Sentence']
b = df['First2']
#The function seems to loop over all r's but only over the first b:
def func(z):
for x in b:
if x in r:
s = z[z.index(x):]
return s
else:
return ‘’
df['Segments'] = a.apply(func)
It seems that looping over two DataFrames simultaneously in this way does not work. Is there a more efficient and effective way to do this?