confusing results of pandas boolean operator

Question

so, I am trying to analyze a whatsapp message. I tried looking for messages which contain the words "salam" (a), which messages contain both "salam" and "terima" (b), and then which messages contain "salam" but don't contain "terima" (c).

this is the code I used.

len(df[(df['message'].str.contains("salam"))])
len(df[(df['message'].str.contains("salam")) & (df['message'].str.contains("terima"))])
len(df[(df['message'].str.contains("salam")) != (df['message'].str.contains("terima"))])

in the image, a = 197, b = 143, and c = 72. Isn't it supposed to be a = b + c? Or perhaps != isn't the NOT operator I should've used? Does anyone have any idea what did I do wrong? Thank you so much for your help.

Use `~` like `len(df[~((df['message'].str.contains("salam")) & (df['message'].str.contains("terima")))])` — jezrael, Jun 30 '20 at 07:36
Thank about what effect '!=' gives you. In this context it will give you an exclusive or so the total will be those messages that contain either 'salam' or 'terima' but not both. So if a message contains 'terima' but not 'salam' it will be included in the count. These messages will not be counted in a or b. Indeed it seems to me that there are 22 such messages. — Paula Thomas, Jun 30 '20 at 07:41
oh i see. thank you so much! I used ``` ~ ``` in the wrong way, and I mistook ```!=``` for ``` NOT ```, instead of ``` XOR ```. — Lulu Firdaus, Jun 30 '20 at 07:45

confusing results of pandas boolean operator

0 Answers0