3

Background: I have the following dataframe

import pandas as pd
d = {'text': ["paid", "paid and volunteer", "other phrase"]}
df = pd.DataFrame(data=d)
df['text'].apply(str) 

Output:

                   text
0                  paid
1    paid and volunteer
2          other phrase

Goal:

1) check each row to determine if paid is present and return a boolean (return True if paid is anywhere in the text column and False if paid is not present. But I would like to exclude the word volunteer. If volunteer is present, the result should be false.

2) create a new column with the results

Desired Output:

                   text     result
0                  paid     true
1    paid and volunteer     false
2          other phrase     false

Problem: I am using the following code

df['result'] = df['text'].astype(str).str.contains('paid') #but not volunteer

I checked How to negate specific word in regex? and it shows how to exclude a word but I am not sure how to include in my code

Question: How do I alter my code to achieve 1) and 2) of my goal

2 Answers2

0

Using lambda:

df['result'] = df['text'].apply(lambda row: True if ('paid' in row) and ('volunteer' not in row) else False)
niraj
  • 17,498
  • 4
  • 33
  • 48
0

You can use a logical and to check for both conditions.

(df.text.str.contains('paid')) & (~df.text.str.contains('volunteer'))
Out[14]: 
0     True
1    False
2    False
Name: text, dtype: bool
Allen Qin
  • 19,507
  • 8
  • 51
  • 67