0

I have a dataframe in pandas with several similar-looking columns (with different names). I'm trying to write a function which compares the data in two columns and drops the second one if they are identical. I've tried this:

import numpy as np
import pandas as pd

def drop_if_ident(df, col1, col2):
    # Drops second column if columns contain identical data
    if (df.shape[0] == np.sum(pd.notnull(df.col1) == pd.notnull(df.col2)):
        df.drop(
            col2,
            axis=1,
            inplace=True
        )

# Usage
drop_if_ident(my_dataframe, my_first_column, my_second_column)

iPython throws the following error:

File "<ipython-input-109-e11b622181bb>", line 3
if (df.shape[0] == np.sum(pd.notnull(df.col1) == pd.notnull(df.col2)):
                                                                     ^
SyntaxError: invalid syntax

...but what is the correct syntax here? Apologies for the noob question :)

user1684046
  • 1,739
  • 2
  • 13
  • 15

0 Answers0