I have a pandas dataframe where I'm using the numpy library to determine whether the values in two columns are the same. I run into problems with the values in these columns being blank/NaN and the dataframe reporting that they are differnet although they both show as NaN. What is the best way to handle NaN when doing this analysis?
I have two files that I load into dataframes and then I merge the two on a unique key into the mathced dataframe and then do the following to find differences:
merged = pd.merge(left= df_price_source, right=df_price_secondary, on=['ACCT'], indicator=True, how='inner')
matched = merged[merged['_merge'] == 'both'].copy()
matched['date_comparison'] = np.where(matched["Start Date_x"] == matched["Start Date_y"], True, False)
matched['profit_comparison'] = np.where(matched["profit_x"] == matched["profit_y"], True, False)
For these new columns matched['date_comparison'] and matched[profit_comparison'] I see False in NaN columns but I would expect True.