I am doing data analysis on a group of different excel files, each with a footer. The start point of the footer changes based on the total number of rows. The footer starts in the first column as a blank cell then has text that is not formatted like the rest of the data in the column. I am trying to come up with a footer length variable to drop into skip_footer when I read the files.
df looks like
+--------------------+
| A B C |
+--------------------+
| Data Data Data |
| Data Data Data |
| [Blank] |
| This is |
| The footer |
| I need to remove |
+--------------------+
I have tried using both methods at this link, but I can't seem to get either to work. One of the errors I am getting is invalid file. Im 99% sure that the invalid file is coming because the file is an xlsx. I don't get an error when I open and read the file, only when I try to run functions on it.
Code:
import os
direct = "path"
file = open(direct, "file name"), "r")
import itertools as it
def get_footer(file_):
with open(file_) as f:
g = it.dropwhile(lambda x: x != ' ', f)
footer_len = len([i for i, _ in enumerate(g)])
return footer_len
footer_len = get_footer(file)