0

so, I am trying to analyze a whatsapp message. I tried looking for messages which contain the words "salam" (a), which messages contain both "salam" and "terima" (b), and then which messages contain "salam" but don't contain "terima" (c).

this is the code I used.

len(df[(df['message'].str.contains("salam"))])
len(df[(df['message'].str.contains("salam")) & (df['message'].str.contains("terima"))])
len(df[(df['message'].str.contains("salam")) != (df['message'].str.contains("terima"))])

this is the amount of the messages above.

in the image, a = 197, b = 143, and c = 72. Isn't it supposed to be a = b + c? Or perhaps != isn't the NOT operator I should've used? Does anyone have any idea what did I do wrong? Thank you so much for your help.

Lulu Firdaus
  • 53
  • 1
  • 8
  • 1
    Use `~` like `len(df[~((df['message'].str.contains("salam")) & (df['message'].str.contains("terima")))])` – jezrael Jun 30 '20 at 07:36
  • 1
    Thank about what effect '!=' gives you. In this context it will give you an exclusive or so the total will be those messages that contain either 'salam' or 'terima' but not both. So if a message contains 'terima' but not 'salam' it will be included in the count. These messages will not be counted in a or b. Indeed it seems to me that there are 22 such messages. – Paula Thomas Jun 30 '20 at 07:41
  • oh i see. thank you so much! I used ``` ~ ``` in the wrong way, and I mistook ```!=``` for ``` NOT ```, instead of ``` XOR ```. – Lulu Firdaus Jun 30 '20 at 07:45

0 Answers0