I am trying to create a subset of my data according to certain terms in the text column of my DataFrame.
df = pd.DataFrame({'id': [123, 456, 789, 101, 402],
'text': [[{'the meeting was amazing'}, {'we should do it more often'}],
[{'start': '15', 'tag': 'Meeting'}],
[],
[{'Let this be the end of it'}],
[{'end': '164', 'tag': 'meetingno2'}]
]
})
I want to get a subset with rows 1, 2, and 5 where the term 'meeting' appears in some form.
I have tried the following code:
df_sub = df[df['text'].isin(df['text'].str.findall(r'[Mm]eeting+'))]
But the resulting subset I get with this code only contains the rows where the text column is empty. However, when I try doing
df['text_2'] = df['text'].str.findall(r'[Mm]eeting+'))
--it produces a new column in the df with the value 'meeting' for rows 1, 2, and 5. Therefore, I think it is picking up the text but not splitting it correctly. How can I get the desired output?