To try to avoid any NaT, NaN and None comparison issues, I am trying to convert to a string value of "NULL" before I do the comparison.
if frames_equal == False:
print(file_name, " value by value check for differences:")
source_columns = df.columns;
print(file_name, " columns:")
print(source_columns);
for source_index, source_row in df.iterrows():
for source_col in source_columns:
source_value = source_row[source_col];
target_value = df_file.loc[source_index, source_col];
if pd.isna(source_value) or pd.isnull(source_value):
source_value = '__NULL__';
elif pd.isna(target_value) or pd.isnull(target_value):
target_value = '__NULL__';
if source_value != target_value:
values_equal = False;
print("~" * 50);
print(file_name, " value differences in column ", source_col);
print("MISMATCH AT INDEX: ", source_index)
print("REGISTRATION_UID: ", source_row["REGISTRATION_UID"])
print("Column: ", source_col);
print("Source Value: ", source_value);
print("Source Type: ", type(source_value));
print("Target Value: ", target_value);
print("Target Type: ", type(target_value));
print("~" * 50)
I am checking if the source or target value is a null by using pd.isna() or pd.isnull() on the source and target values before I compare.
However I am still getting non-equality tests in my output.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2020_07_27__lu_volume.csv value differences in column LU_INSERT_YEAR
MISMATCH AT INDEX: 23740
REGISTRATION_UID: ZOMI-00041736
Column: LU_INSERT_YEAR
Source Value: __NULL__
Source Type: <class 'str'>
Target Value: nan
Target Type: <class 'numpy.float64'>
This implies my nan values are not getting picked up and converted to 'NULL' string prior to the comparison?