I am trying to compare two data frames using datacompy package. And I am seeing something weird.
print(dfA)
req_nbr unit_cost_amt
0 24468868 1.36870
1 24468868 1.36870
2 24468868 1.64952
3 24468868 1.64952
4 24468868 0.83289
5 24468868 0.83289
6 24468868 0.83289
7 24468868 0.83289
Then I have another dataframe with same data and structure.
print(dfB)
req_nbr unit_cost_amt
0 24468868 1.36870
1 24468868 1.36870
2 24468868 1.64952
3 24468868 1.64952
4 24468868 0.83289
5 24468868 0.83289
6 24468868 0.83289
7 24468868 0.83289
Both dataframes are same and of same datatypes.
dfA['unit_cost_amt'].dtype
dtype('float64')
dfB['unit_cost_amt'].dtype
dtype('float64')
Now I m doing a compare using datacompy
compare = datacompy.Compare(
dfA,
dfB,
# You can also specify a list of columns
join_columns = ['req_nbr'],
# Optional, defaults to 0
abs_tol = 0,
# Optional, defaults to 0
rel_tol = 0,
# Optional, defaults to 'df1'
df1_name = 'Old',
# Optional, defaults to 'df2'
df2_name = 'New'
)
print(compare.report())
And it shows differences...
DataComPy Comparison
--------------------
DataFrame Summary
-----------------
DataFrame Columns Rows
0 Old 2 8
1 New 2 8
Column Summary
--------------
Number of columns in common: 2
Number of columns in Old but not in New: 0
Number of columns in New but not in Old: 0
Row Summary
-----------
Matched on: req_nbr
Any duplicates on match values: Yes
Absolute Tolerance: 0
Relative Tolerance: 0
Number of rows in common: 8
Number of rows in Old but not in New: 0
Number of rows in New but not in Old: 0
Number of rows with some compared columns unequal: 4
Number of rows with all compared columns equal: 4
Column Comparison
-----------------
Number of columns compared with some values unequal: 1
Number of columns compared with all values equal: 1
Total number of values which compare unequal: 4
Columns with Unequal Values or Types
------------------------------------
Column Old dtype New dtype # Unequal Max Diff # Null Diff
0 unit_cost_amt float64 float64 4 1.110223e-16 0
Sample Rows with Unequal Values
-------------------------------
req_nbr unit_cost_amt (Old) unit_cost_amt (New)
6 24468868 0.83289 0.83289
7 24468868 0.83289 0.83289
4 24468868 0.83289 0.83289
5 24468868 0.83289 0.83289
Any idea what I am doing wrong here ? Its puzzling.