My problem is as follows. Using the pd.ExcelFile methodology I've figured out how to read and parse the relevant Excel files and Sheets that I want to compare. However, while both have similar data in the comparable columns, one has additional columns that I do not want to analyze.
So herein lies the core issue: how do I select specific columns in Pandas to compare against one another. Ideally I would like to pick a range of columns ie:(columns 1-5 + columns 7 to 15) and then have those columns analyzed for differences, with the differences printed to another excel file.
My code so far is:
import pandas as pd
#open excel files and parse (read) the relevant sheets
df0 = pd.ExcelFile(r"excel path")
df1 = pd.ExcelFile(r"excel path")
df0.parse("Sheet1")
df1.parse("Sheet2")
#skip the first row, which contains only column names and no data
df0.skiprows(0)
df1.skiprows(0)
After trying several methods to select individual columns and compare them I have failed. Help?
Thanks!