1

Right now, I'm trying to run a simple code that cycles through thousands of excel sheets and extracts the data. I'm using Python 3.7 with packages Pandas and Glob. The code is simple as shown below:

for fname in glob.glob(path):
   df = pd.read_excel(fname, index_col=None, header=None)

However, this gives me a Type error. TypeError: Fill() takes no arguments

With that being said, I've found that the issue is due to the way the timestamps are formatted within the excel sheet. In this case, the column widths are too small so excel returns "#####"; however, when I extend the cell width and resave the file, the code operates perfectly fine. I've attached an image of the excel sheet as well (Excel Example). An image of the extended cell width is also attached (Extended Cell Width).

Unfortunately, it isn't feasible for me to go through and make this edit manually. Does anyone have any experience in dealing with this?

Thanks in advance, Dan

Update with Excel File information: File extension: ".xlsx" Link: Link to Excel File

  • Can you provide an xml file with the error so that we can download it? Your example image is good, but I tried replicating it and didn't get any issue, we really need an xml. Also: what is the extension of your excel file (there are many redundant excel extensions) ? – Florian Fasmeyer Mar 15 '21 at 21:58
  • I apologize, I'm relatively new to coding, but there is no way to convert to an xml because the excel file doesn't have any mappings. I'm more than happy to provide a link to the file if that's allowed on the website. Additionally, the file extension is ".xlsx" – Dan Offenbacker Mar 15 '21 at 22:22
  • Don't apologize, we're all good in here. Casually trying to help people. ;) – Florian Fasmeyer Mar 15 '21 at 22:24
  • hmmm. Try to install xlrd and openpyxl. But yes, a file would be welcome. – Florian Fasmeyer Mar 15 '21 at 22:27
  • As a civil engineer (with almost no experience in coding), I appreciate your willingness to help! I believe all the information you're looking for has been uploaded. – Dan Offenbacker Mar 15 '21 at 22:29
  • You are welcome! For the others, I confirm! using his data, we get strange behaviour. Opening the file to either withe the Time column OR to reformat it will solve the issue, but that's not what the OP needs. I'll keep on digging! – Florian Fasmeyer Mar 15 '21 at 23:06

1 Answers1

0

Viable but unsafe solution

You could use an older version of xlrd.

#In your cmd:
pip install xlrd=1.2.0

Then...

pandas.read_excel('your_file_path', engine='xlrd')

But please read the warning below first, you must TRUST your data to be safe. If you know the data you use comes from a trustworthy source you may us it so long as your version of xlrd is in a virtual environment (see anaconda), so that you are not using it to open unsafe files by mistake 2 years down the line.

Observations

I replicated the bug of the op. Here is what I found: I and op have both followed the instructions and installed openpyxl

# Welcome inside of the openpyxl code base! Let's find that bug!
def _convert(expected_type, value):
    """
    Check value is of or can be converted to expected type.
    """
    if not isinstance(value, expected_type):
        try:
            # ERROR TRIGGERED HERE!
            value = expected_type(value) 
        except:
            raise TypeError('expected ' + str(expected_type))
    return value

Here expected_type() could be a float or datetime, such as

value = expected_type(value) # Before
value = float(value)         # After

But what we get instead is <class 'openpyxl.styles.fills.Fill'> So it tries to cast the value like this:

value = <class 'openpyxl.styles.fills.Fill'>(value)

which makes no sense! >:(

It seems like openpyxl is reading the column width and found that the column is too small to display a valu. As such it uses the FILL method to fill the space with "XXXX" to indicate the lack of space...

What does not work:

  • setting the usecols parameter to avoid loading the date column
  • setting the dtype parameter to force the type to str
  • trying to convert the file with openpyxl

The following conversion fails:

workbook = openpyxl.load_workbook(r'X:...\Cycle120.xlsx')
workbook.save(r'X:...\OUTPUT.xls')
# Same error... value = expected_type(value)
# TypeError: Fill() takes no arguments

Warning

Do not switch the old xlrd (engine) version to read your xlsx files. The old engine has security issues!

The reason xlsx support was removed is because it had potential security vulnerabilities and no-one was maintaining it. If you choose this approach, rather than the trivial switch to openpyxl, you are risking exposure to these.

See: Pandas cannot open an Excel (.xlsx) file

Florian Fasmeyer
  • 795
  • 5
  • 18
  • Florian, thank you. I agree and didn't think the display should be an issue. I just don't understand why the extend/resave process would fix the problem! – Dan Offenbacker Mar 15 '21 at 22:01
  • extend/resave? I never asked nor advised for that. I want a real file from you. With one line that fails at least. I also want to know the extension type "myfile.xlsx" or "myfile.xls". Also, what is your version of pandas? – Florian Fasmeyer Mar 15 '21 at 22:04
  • It seems like a bug with openpyxl which tries to read the column with... but we want a date instead. Changing the pandas.read_excel engine= ... may help, I'm on it! :) – Florian Fasmeyer Mar 15 '21 at 23:31
  • 1
    Using xlrd worked perfectly. I also hadn't used venv before, so I was able to learn two new things at once! Thanks again Florian. – Dan Offenbacker Mar 16 '21 at 14:57