I'm having issues comparing two data frames from two separate excel files in Python. I would like to only see the differences between DF1 and DF3. I think I am doing something wrong. Is it possible you could point me in the right direction? I have included my code below.
Please note I removed the file locations, but I am able to save excel files okay.
import pandas as pd
# importing file
df1 = pd.read_excel('df11.xlsx')
#drop everything but date
df1.drop(['Date'], axis=1, inplace=True)
#only keep first column
df1 = df1.iloc[:0]
#remove first line
df1 = df1.iloc[5:]
#transpose
df1 = df1.transpose()
#save file
df1.to_excel(r'C:filelocation\df1_output1.xlsx')
#import but skip first row
df2 = pd.read_excel (r'C:filelocation.xlsx', skiprows=1)
#look into Org Unit and look for specific word
df3 = df2[df2['OrgUnit'] == 'People']
#drop everything but Email address
df3 = df3[['E-Mail Address']]
#reset index
df3.reset_index(drop=True, inplace=True)
#save to file
df3.to_excel(r'filelocation.xlsx')