1

I connected to SharePoint to my Cloud Environment Directory and downloaded it to my current directory.

file_url = """/sites/XXXXXX/Shared%20Documents/General/new_name/train_data.xlsx"""

response = File.open_binary(ctx, file_url)
response.raise_for_status()

import pathlib

ROOT_DIR = pathlib.Path().resolve()
download_file_path = ROOT_DIR / "download_test.xlsb"


response.raise_for_status() # 200 is the response code, so it downloaded successfully


with open(download_file_path, "wb") as local_file: # This is the part where I think needs to be changed
    local_file.write(response.content)


I cannot open the downloaded file on Excel as it says it "format or extension is not valid".

I used the following to read the downloaded Excel file ending with .xlsb extension for analysis in Pandas:


import pandas as pd

pd.read_excel("download_test.xlsb", engine='pyxlsb')

I get the following error:

usr/lib/python3.6/zipfile.py in open(self, name, mode, pwd, force_zip64)
   1373         else:
   1374             # Get info object for name
-> 1375             zinfo = self.getinfo(name)
   1376 
   1377         if mode == 'w':

/usr/lib/python3.6/zipfile.py in getinfo(self, name)
   1302         if info is None:
   1303             raise KeyError(
-> 1304                 'There is no item named %r in the archive' % name)
   1305 
   1306         return info

KeyError: "There is no item named 'xl/_rels/workbook.bin.rels' in the archive"

I already searched for similar questions Read XLSB File in Pandas Python Convert .xlsb to .xlsx - Multi-sheet Microsoft Excel File

Without success. Any help is very much appreciated.

matt.aurelio
  • 381
  • 2
  • 9

1 Answers1

0

Try the latest xlsb2xlsx package on PyPI:

pip install xlsb2xlsx
python -m xlsb2xlsx /directory_with_xlsb_file

Then you can use pandas with something like:

import pandas as pd
df = pd.read_excel('your_filepath.xlsx')

And work with the df object from there.

See https://pypi.org/project/xlsb2xlsx/ for more info.