How do I add xlsb files to the catalog in Kedro?

Question

1.I am using this code in catalog.yml file

equipment_data:
  type: pandas.ExcelDataSet
  filepath: data\01_raw\Equipment Profile.xlsb
  layer: raw

getting error after executing kedro run command.

` kedro.io.core.DataSetError: Failed while loading data from data set ExcelDataSet(filepath=C:/Users/Akshay Salvi/Desktop/Bizmetrics/kedro-environment/petrocaeRepo/data/01_raw/2. Cycle data (per trip)-20210113T042557Z-001/2. Cycle data (per trip)/CycleData,2020.xlsb, load_args={'engine': xlrd}, protocol=file, save_args={'index': False}, writer_args={'engine': xlsxwriter}).

Excel 2007 xlsb file; not supported `

I don't think the ExcelDataSet supports xlsb. Maybe write a custom dataset? — Lim H., Feb 18 '21 at 13:14

score 1 · Answer 1 · answered Feb 18 '21 at 17:45

So the pandas.ExcelDataset simply calls pandas underneath so hopefully you can have luck following this example from another thread where the engine (provided by pip install pyxlsb installing another package) is used to parse it and simply provide the engine parameter as load_args in your YAML catalog.

How do I add xlsb files to the catalog in Kedro?

1 Answers1