Viable but unsafe solution
You could use an older version of xlrd
.
#In your cmd:
pip install xlrd=1.2.0
Then...
pandas.read_excel('your_file_path', engine='xlrd')
But please read the warning below first, you must TRUST your data to be safe. If you know the data you use comes from a trustworthy source you may us it so long as your version of xlrd
is in a virtual environment (see anaconda), so that you are not using it to open unsafe files by mistake 2 years down the line.
Observations
I replicated the bug of the op. Here is what I found:
I and op have both followed the instructions and installed openpyxl
# Welcome inside of the openpyxl code base! Let's find that bug!
def _convert(expected_type, value):
"""
Check value is of or can be converted to expected type.
"""
if not isinstance(value, expected_type):
try:
# ERROR TRIGGERED HERE!
value = expected_type(value)
except:
raise TypeError('expected ' + str(expected_type))
return value
Here expected_type() could be a float or datetime, such as
value = expected_type(value) # Before
value = float(value) # After
But what we get instead is <class 'openpyxl.styles.fills.Fill'>
So it tries to cast the value like this:
value = <class 'openpyxl.styles.fills.Fill'>(value)
which makes no sense! >:(
It seems like openpyxl
is reading the column width and found that the column is too small to display a valu. As such it uses the FILL method to fill the space with "XXXX" to indicate the lack of space...
What does not work:
- setting the
usecols
parameter to avoid loading the date column
- setting the
dtype
parameter to force the type to str
- trying to convert the file with openpyxl
The following conversion fails:
workbook = openpyxl.load_workbook(r'X:...\Cycle120.xlsx')
workbook.save(r'X:...\OUTPUT.xls')
# Same error... value = expected_type(value)
# TypeError: Fill() takes no arguments
Warning
Do not switch the old xlrd
(engine) version to read your xlsx files. The old engine has security issues!
The reason xlsx support was removed is because it had potential
security vulnerabilities and no-one was maintaining it. If you choose
this approach, rather than the trivial switch to openpyxl, you are
risking exposure to these.
See: Pandas cannot open an Excel (.xlsx) file