XLSB to CSV with pandas, python

Question

Trying to convert multiple XLSB files to CSV. Not sure what is the problem here

import os

import pandas as pd

path = r'C://Users//greencolor//Autoreport//Load_attachments//'
for filename in os.listdir(path):
    if filename.startswith("PB orders"):
        print(filename)                         #until here its working
        month = pd.read_excel(filename, sheet_name="Raw data ", engine="pyxlsb")
        print(month)                            # I get the error here
        month = month[month['Sales Manager'] == 'DEVON, JOHN'] #filtering by manager
        month.to_csv (path + filename + ".csv", index = None, header=True)

Error

FileNotFoundError: [Errno 2] No such file or directory: 'PB orders Dec.xlsb'

Why I get this error? print(filename) is printing all the XLSB files that name starts with PB orders

That means the file path is wrong. `filename` is just the file name, not the full path. You need to combine it with the root path to get the actual full path — Panagiotis Kanavos, Feb 04 '22 at 09:38

Panagiotis Kanavos · Accepted Answer · 2022-02-04T09:46:15.327

filename is just the file's name, not the full path. You need to combine it with path to get the full path to the file. You can do that in a safe manner with os.path.join :

import os
...
for filename in os.listdir(path):
    if filename.startswith("PB orders"):
        full_path = os.path.join(path, filename)
        print(full_path )                         
        month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")

Searching with a pattern

An alternative is to use glob to search for files that match a pattern. You still need to generate the full path:

import glob

...

for filename in glob.glob("PB orders*.xlsb", root_dir=path):
    full_path = os.path.join(path, filename)
    print(full_path )                         
    month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")

Avoiding temp files

You still need to check the file name to avoid the temporary files generated when someone opens an Excel file (the files that start with ~) :

for filename in glob.glob("PB orders*.xlsb", root_dir=path):
    if not os.path.basename(filename).startswith("~"):
        full_path = os.path.join(path, filename)
            print(full_path )                         
            month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")

Thanks it works perfectly, but how should I change the last part where it converts to CSV? — Greencolor, Feb 04 '22 at 09:54
basically i want to keep the original name ( PB orders December) but the extension should be csv — Greencolor, Feb 04 '22 at 09:55
@Greencolor there are several answers for this, eg [in this question](https://stackoverflow.com/questions/3548673/how-can-i-replace-or-strip-an-extension-from-a-filename-in-python). Most of the answers work if the filename contains only one dot. pathlib's `with_suffix` can be used to change the extension no matter what the rest of the filename contains — Panagiotis Kanavos, Feb 04 '22 at 10:04

score 1 · Answer 2 · answered Feb 04 '22 at 09:40

When you do month = pd.read_excel(filename, sheet_name="Raw data ", engine="pyxlsb") try to replace it with this:

month = pd.read_excel(path + filename, sheet_name="Raw data ", engine="pyxlsb")

This will prepend the path to your filenames in the given directory.

XLSB to CSV with pandas, python

2 Answers2