1

I have a repetetive task, where I download multiple excel files (I'm forced to download in xlsx format), I then take column G from each excel file and concatenate them into "final.xlsx" Then "final.xlsx" is compared to another excel workbook to see if all number instances are matched in each workbook.

I'm now working on making a cross platform python app to solve this. However, pandas won't allow xlsx files anymore, and manually opening and saving them as xls files just adds more repetitive manual labour.

Is there a cross-platform way for python to convert xlsx files to xls? Or should I abandon pandas and go with openpyxl since I'm forced to handle xlsx format?

I tried using this without success ;

from pathlib import Path
import openpyxl
import os
 
# get files
os.chdir(os.path.abspath(os.path.dirname(__file__)))
pdir = Path('.')
filelist = [filename for filename in pdir.iterdir() if filename.suffix == '.xlsx']
 
for filename in filelist:
    print(filename.name)
 
for infile in filelist:
    workbook = openpyxl.load_workbook(infile)
    outfile = f"{infile.name.split('.')[0]}.xls"
    workbook.save(outfile)
rovaut
  • 13
  • 3
  • what is your version of pandas? – antoine Jan 12 '21 at 13:11
  • This is an XY problem... instead of attempting to convert to `xls`, use `openpyxl` as the `engine` with `pandas`. See the linked question for a thorough discussion. – BigBen Jan 12 '21 at 13:29

2 Answers2

2

You can still use pandas, but you would need openpyxl. As you have it in your code, I suppose it is ok for you. Otherwise, you can install it via: pip install openpyxl.

The following illustrates how this can work. Kr.

import pandas as pd
fpath = r".\test.xlsx"
df = pd.read_excel (fpath, engine='openpyxl')
print(df)
   A  B
0  1  2
1  1  2
antoine
  • 662
  • 4
  • 10
  • Solved, thanks. I tried openpyxl engine, but it threw an error. I suspect my issue was having an outdated pandas/openpyxl/python. After upgrading everything, openpyxl engine works. – rovaut Jan 12 '21 at 14:23
0

Previously, the default argument engine=None to read_excel() would result in using the xlrd engine in many cases, including new Excel 2007+ (.xlsx) files. If openpyxl is installed, many of these cases will now default to using the openpyxl engine. See the read_excel() documentation for more details.

Thus, it is strongly encouraged to install openpyxl to read Excel 2007+ (.xlsx) files. Please do not report issues when using xlrd to read .xlsx files. This is no longer supported, switch to using openpyxl instead.

https://pandas.pydata.org/docs/whatsnew/v1.2.0.html

Suhas Mucherla
  • 1,383
  • 1
  • 5
  • 17