0

My problem is as follows. Using the pd.ExcelFile methodology I've figured out how to read and parse the relevant Excel files and Sheets that I want to compare. However, while both have similar data in the comparable columns, one has additional columns that I do not want to analyze.

So herein lies the core issue: how do I select specific columns in Pandas to compare against one another. Ideally I would like to pick a range of columns ie:(columns 1-5 + columns 7 to 15) and then have those columns analyzed for differences, with the differences printed to another excel file.

My code so far is:

import pandas as pd

#open excel files and parse (read) the relevant sheets
df0 = pd.ExcelFile(r"excel path")
df1 = pd.ExcelFile(r"excel path")
df0.parse("Sheet1")
df1.parse("Sheet2")

#skip the first row, which contains only column names and no data
df0.skiprows(0)
df1.skiprows(0)

After trying several methods to select individual columns and compare them I have failed. Help?

Thanks!

A. Blackmagic
  • 233
  • 1
  • 3
  • 9
  • 1
    Very basic question, it's only one google search away, so please search first. If you have an issue with your code, better post that here so we can tell you where you're going wrong. – cs95 Sep 27 '17 at 14:55
  • Ah, I couldn't find that answer when I searched! Thanks for linking it Coldspeed, I'll definitely focus on uploading more code in my (inevitably) ensuing questions! – A. Blackmagic Sep 27 '17 at 15:12

0 Answers0