I was comparing two excel files which contains the information of the students of two schools. However those files might contain different number of rows between them.
The first set that I used is to import the excel files in two dataframes:
df1 = pd.read_excel('School A - Information.xlsx')
df2 = pd.read_excel('School B - Information.xlsx')
print(df1)
Name Age Birth_Country Previous Schools
0 tom 10 USA 3
1 nick 15 MEX 1
2 juli 14 CAN 0
3 tom 19 NOR 1
print(df2)
Name Age Birth_Country Previous Schools
0 tom 10 USA 3
1 tom 19 NOR 1
2 nick 15 MEX 4
After this, I would like to check the divergences between those two dataframes (index order is not important). However I am receiving an error due to the size of the dataframes.
compare = df1.values == df2.values
<ipython-input-9-7cc64ba0e622>:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
compare = df1.values == df2.values
print(compare)
False
Adding to that, I would like to create a third DataFrame with the corresponding divergences, that shows the divergence.
import numpy as np
rows,cols=np.where(compare==False)
for item in zip(rows,cols):
df1.iloc[item[0], item[1]] = '{} --> {}'.format(df1.iloc[item[0], item[1]],df2.iloc[item[0], item[1]])
However, using this code is not working, as the index order may be different between the two dataframes.
My expected output should be the below dataframe: