-3

From what I understand this should just pass by the if statement if there is no email column present but it's throwing KeyError: None when it gets to the if statement. Could someone help me understand this rule? Thanks!

def not_isin(df1, df2, phone_col, df2phone_col,website_col, df2website_col,company_col, df2company_col, email_col, df2email_col):
    print("entered not_isin")
    df1 = df1[~df1[phone_col].isin(df2[df2phone_col])]
    df1 = df1[~df1[website_col].isin(df2[df2website_col])]
    df1 = df1[~df1[company_col].isin(df2[df2company_col])]
    if df1[email_col] and df2[df2email_col]:
        df1 = df1[~df1[email_col].isin(df2[df2email_col])]
        return df1
    return df1
Justin Benfit
  • 423
  • 3
  • 11
  • 1
    What's the value of `email_col` and `df2email_col`? – Barmar Feb 11 '22 at 16:22
  • 1
    And once you fix this, I think you'll run into the common error "The truth value of a series is ambiguous". Don't try to use a dataframe as a boolean. – Barmar Feb 11 '22 at 16:23
  • @Barmar email col is 'email' and df2email_col is 'Email' in this case. I get the email cols like so because they come in different on each new dataset. def get_email(df): email = [col if col.lower().startswith('email') else None for col in df][0] return email Then I assign it like email_col = get_email(df1) – Justin Benfit Feb 11 '22 at 16:26
  • I'm just trying to find if the column name is present in the dataframe headers. – Justin Benfit Feb 11 '22 at 16:27
  • But if the column name isn't present, you get a `KeyError`. You have to check if the column exists before trying to use it as an index. – Barmar Feb 11 '22 at 16:28
  • 1
    It should be `if email_col in df1 and df2email_col in df2:` – Barmar Feb 11 '22 at 16:29
  • Oh I see the error in my understanding of it. Thank you! – Justin Benfit Feb 11 '22 at 16:30

1 Answers1

1

I think it fails at df1[email_col].

Obviously, if email_col is None, you get a KeyError before even reaching the if statement.

Jérôme
  • 13,328
  • 7
  • 56
  • 106
  • Thank you Jerome! I was under the impression that if df1[email_col] was the equivalent of " if the email column exists in this dataframe then proceed with the logic of the if statement. Am I mistaken? – Justin Benfit Feb 11 '22 at 16:29