I would like to compare two unsorted dataframes and report the differences by column and row ID coordinates and values.
The code, first compares the csv's and if they are not equal it compares based on a merge, and if they still are not equal I know that data is different in some way.
At this point i'm not sure how to identify the column and row coordinates along with the value for each dataframe where the data is identified to be different.
Here are the dataframes:
DATAFRAME 1 - EXP:
Here is my current code:
import pandas as pd
file1 = "c:\\exp.csv"
file2 = "c:\\act.csv"
exp = pd.read_csv(file1, encoding="ANSI")
act = pd.read_csv(file2, encoding="ANSI")
exp = exp.drop_duplicates(subset=None, keep='first', inplace=False)
act = act.drop_duplicates(subset=None, keep='first', inplace=False)
result = exp.equals(act)
if result:
print("CSV's Match")
else:
act = act.set_index('accounts')
dataMerged = exp.merge(act, how='left')
dataMergedAndSorted = dataMerged.sort_values(['accounts']).set_index('account numbers')
actSorted = act.sort_values(['accounts'])
if dataMergedAndSorted.equals(actSorted):
print("The Merged, Sorted, and Compared Data Now Returns True : PASS")
else:
dataMergedAndSorted = dataMergedAndSorted.reset_index()
actSorted = actSorted.reset_index()
Known Differences by Observation of the Data Frames & How to Report it:
Exp: Col=1,Row=4,Val=9101, Act: Col=1,Row=3,Val=FOO
Exp: Col=3,Row=6,Val=BAR, Act: Col=3,Row=5,Val=malfoy