0

I'm using df.str.contains to find the "@company" substring in a particular column and drop all that don't match.

I've done it manually to compare and I should be getting over 100 results, but I get only 55 running the script.

DF = DF[DF['Email '].astype(str).str.contains('@company', regex=False)]

I'm using regex=False to prevent .str.contains from thinking the @ sign is a Regex

UPDATE:

I solved the issue by moving this part of the program above others, as other "filters" and "drops" were removing rows before the email could be read. Thanks for your input!

VRumay
  • 113
  • 8
  • 3
    This `"@company"` is not the same as this `"@COMPANY"` – Dani Mesejo Oct 09 '19 at 20:47
  • If you need to capture both uppercase and lowercase of `@company` you can just make all of it lowercase and search only the lowercase version: `DF = DF[DF['Email '].astype(str).str.lower().contains('@COMPANY', regex=False)]` – linamnt Oct 09 '19 at 20:49
  • Sorry, just a typo while writing the question. I'll edit it. – VRumay Oct 09 '19 at 20:50
  • 2
    you can pass `case=False` as well to ignore case, or as linamnt has done by using `.str.lower()` let us know if you need anymore help – Umar.H Oct 09 '19 at 20:52
  • 2
    I think there must be something strange about the dataframe, or specifically those lines not getting detected, that we cannot know without seeing the data – linamnt Oct 09 '19 at 20:53
  • I passed `case=False` in hopes it was something like that, still not working. How could I share the dataset with you? – VRumay Oct 09 '19 at 20:56
  • If @DanielMesejo is right, this is a duplicate of https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison – LoneWanderer Oct 09 '19 at 20:56
  • @LoneWanderer Yeah, it's all email addresses. Unfortunately the data is sensitive for the company I work for and I can't share it without "cleaning it", or it might cost me my job. I think that changing anything to later share it with you will actually serve no purpose if the errors are in the data itself as linamnt suggested – VRumay Oct 09 '19 at 21:40
  • Forgive me, but then, your whole question is pointless, isn't it ? (BTW, I did not notice the 'Email' column at first... what a genius ! too focused on the '@company') – LoneWanderer Oct 09 '19 at 21:41
  • 1
    @LoneWanderer haha, no worries. I solved the issue by moving this part of the program above others, as other "filters" and "drops" were removing rows before the email could be read. Thanks for your input! – VRumay Oct 09 '19 at 23:37

0 Answers0