0

I'm having issues comparing two data frames from two separate excel files in Python. I would like to only see the differences between DF1 and DF3. I think I am doing something wrong. Is it possible you could point me in the right direction? I have included my code below.

Please note I removed the file locations, but I am able to save excel files okay.

import pandas as pd

# importing file
df1 = pd.read_excel('df11.xlsx')

#drop everything but date
df1.drop(['Date'], axis=1, inplace=True)

#only keep first column
df1 = df1.iloc[:0]

#remove first line
df1 = df1.iloc[5:]

#transpose
df1 = df1.transpose()

#save file
df1.to_excel(r'C:filelocation\df1_output1.xlsx')

#import but skip first row
df2 = pd.read_excel (r'C:filelocation.xlsx', skiprows=1)

#look into Org Unit and look for specific word
df3 = df2[df2['OrgUnit'] == 'People']

#drop everything but Email address
df3 = df3[['E-Mail Address']]

#reset index
df3.reset_index(drop=True, inplace=True)

#save to file
df3.to_excel(r'filelocation.xlsx')
  • 1
    I believe second line of code does the exact opposite of what is written in comment. (It is only deleting Date column,). Also without example data, it is very difficult to understand what exactly is required to be done. – Pooja Sonkar Jul 05 '21 at 13:43
  • Can you provide the two xlsx input files and explain how you would like to compare them? – mozway Jul 05 '21 at 14:38
  • Like the previous commenters said, it's very hard to help you without seeing some sample data. Now, it's hard for you to share excel files, but it's very easy to share 5 or 6 rows of data from each one, in a format that we can copy/paste so we can then run your code. See this [page](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for examples of how to include sample data from `df1` and `df2`, instead of reading the from excel. – joao Jul 05 '21 at 14:44

0 Answers0