I have a dataframe in pandas with several similar-looking columns (with different names). I'm trying to write a function which compares the data in two columns and drops the second one if they are identical. I've tried this:
import numpy as np
import pandas as pd
def drop_if_ident(df, col1, col2):
# Drops second column if columns contain identical data
if (df.shape[0] == np.sum(pd.notnull(df.col1) == pd.notnull(df.col2)):
df.drop(
col2,
axis=1,
inplace=True
)
# Usage
drop_if_ident(my_dataframe, my_first_column, my_second_column)
iPython throws the following error:
File "<ipython-input-109-e11b622181bb>", line 3
if (df.shape[0] == np.sum(pd.notnull(df.col1) == pd.notnull(df.col2)):
^
SyntaxError: invalid syntax
...but what is the correct syntax here? Apologies for the noob question :)