0

To try to avoid any NaT, NaN and None comparison issues, I am trying to convert to a string value of "NULL" before I do the comparison.

 if frames_equal == False:
        print(file_name, " value by value check for differences:")
        source_columns = df.columns;
        print(file_name, " columns:")
        print(source_columns);
        for source_index, source_row in df.iterrows():

            for source_col in source_columns:

                source_value = source_row[source_col];
                target_value = df_file.loc[source_index, source_col];

                if pd.isna(source_value) or pd.isnull(source_value):
                   source_value = '__NULL__';
                elif pd.isna(target_value) or pd.isnull(target_value):
                    target_value = '__NULL__';

                if source_value != target_value:
                    values_equal = False;
                    print("~" * 50);
                    print(file_name, " value differences in column ", source_col);
                    print("MISMATCH AT INDEX: ", source_index)
                    print("REGISTRATION_UID:  ", source_row["REGISTRATION_UID"])
                    print("Column: ", source_col);
                    print("Source Value: ", source_value);
                    print("Source Type: ", type(source_value));
                    print("Target Value: ", target_value);
                    print("Target Type: ", type(target_value));
                    print("~" * 50)

I am checking if the source or target value is a null by using pd.isna() or pd.isnull() on the source and target values before I compare.

However I am still getting non-equality tests in my output.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2020_07_27__lu_volume.csv  value differences in column  LU_INSERT_YEAR
MISMATCH AT INDEX:  23740
REGISTRATION_UID:   ZOMI-00041736
Column:  LU_INSERT_YEAR
Source Value:  __NULL__
Source Type:  <class 'str'>
Target Value:  nan
Target Type:  <class 'numpy.float64'>

This implies my nan values are not getting picked up and converted to 'NULL' string prior to the comparison?

smackenzie
  • 2,880
  • 7
  • 46
  • 99
  • Just a comment on how to loop dfs... maybe zip or merge the dfs https://stackoverflow.com/q/16476924/6692898 – RichieV Jul 27 '20 at 00:12
  • What is the purpose of the whole check? Have you tried `.update()` and `.fillna()`? https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.update.html#pandas.DataFrame.update – RichieV Jul 27 '20 at 00:17
  • @RichieV I am reading in a dataframe via Oracle, writing it to CSV. Then as a validation, I am reading in the CSV immediately and comparing it to the in memory Oracle dataframe. DataFrame.equals() doesn't work, so I am doing a value by value comparison. I am using pandas to take backups of Oracle data which I can compare over time, so want to check they are consistent. – smackenzie Jul 27 '20 at 00:25
  • And i think your non equality is caused when both dfs are nan... try changing `elif pd.isna(target_value...` to `if pd.isna(target_value...` – RichieV Jul 27 '20 at 00:26
  • @ALollz see above comment. I am reading in a frame from Oracle, writing it to CSV, reading in the CSV immediately and want to check they match. The CSV read in does not match the datatypes of the in-memory Oracle dataframe. – smackenzie Jul 27 '20 at 00:28
  • 1
    `df.eq(df2)` works element wise – RichieV Jul 27 '20 at 00:31
  • @RichieV is this different from equals()? equals gives me a False value when I compare my data frames. All I am doing is dumping a data frame to CSV, reading it in then comparing it to the original, equals() returns False. – smackenzie Jul 27 '20 at 00:33
  • Then perhaps you should look into `.convert_dtypes()` as you would when loading for use from csv – RichieV Jul 27 '20 at 00:35
  • Yes it is different, equals checks identity of the whole structure, .eq checks every element and returns a dataframe with true or false for every point – RichieV Jul 27 '20 at 00:37
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/218650/discussion-between-richiev-and-smackenzie). – RichieV Jul 27 '20 at 00:39

0 Answers0