1

I am doing data analysis on a group of different excel files, each with a footer. The start point of the footer changes based on the total number of rows. The footer starts in the first column as a blank cell then has text that is not formatted like the rest of the data in the column. I am trying to come up with a footer length variable to drop into skip_footer when I read the files.

  df looks like    
  +--------------------+
  |   A     B      C   |
  +--------------------+
  | Data  Data   Data  |
  | Data  Data   Data  |
  | [Blank]            |
  | This is            |
  | The footer         |
  | I need to remove   |
  +--------------------+

I have tried using both methods at this link, but I can't seem to get either to work. One of the errors I am getting is invalid file. Im 99% sure that the invalid file is coming because the file is an xlsx. I don't get an error when I open and read the file, only when I try to run functions on it.

 Code: 
 import os
 direct = "path"
 file = open(direct, "file name"), "r")

 import itertools as it
 def get_footer(file_):
      with open(file_) as f:
         g = it.dropwhile(lambda x: x != ' ', f)
         footer_len = len([i for i, _ in enumerate(g)])
     return footer_len

 footer_len = get_footer(file)
Community
  • 1
  • 1
DataNoob
  • 341
  • 2
  • 5
  • 13

1 Answers1

2

I was not able to figure out how to do the above, but I have a much easier answer.

 import pandas as pd
 File = pd.read_excel()
 NoFooter = File[:-6]
DataNoob
  • 341
  • 2
  • 5
  • 13
  • @DataNoob, you could also use the `skip_footer` parameter within pandas.read_excel(). See doc here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html – NickBraunagel Dec 04 '17 at 20:11