0

Trying to convert multiple XLSB files to CSV. Not sure what is the problem here

import os

import pandas as pd

path = r'C://Users//greencolor//Autoreport//Load_attachments//'
for filename in os.listdir(path):
    if filename.startswith("PB orders"):
        print(filename)                         #until here its working
        month = pd.read_excel(filename, sheet_name="Raw data ", engine="pyxlsb")
        print(month)                            # I get the error here
        month = month[month['Sales Manager'] == 'DEVON, JOHN'] #filtering by manager
        month.to_csv (path + filename + ".csv", index = None, header=True)

Error

FileNotFoundError: [Errno 2] No such file or directory: 'PB orders Dec.xlsb'

Why I get this error? print(filename) is printing all the XLSB files that name starts with PB orders

Greencolor
  • 501
  • 1
  • 5
  • 16
  • 2
    That means the file path is wrong. `filename` is just the file name, not the full path. You need to combine it with the root path to get the actual full path – Panagiotis Kanavos Feb 04 '22 at 09:38

2 Answers2

3

filename is just the file's name, not the full path. You need to combine it with path to get the full path to the file. You can do that in a safe manner with os.path.join :

import os
...
for filename in os.listdir(path):
    if filename.startswith("PB orders"):
        full_path = os.path.join(path, filename)
        print(full_path )                         
        month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")

Searching with a pattern

An alternative is to use glob to search for files that match a pattern. You still need to generate the full path:

import glob

...

for filename in glob.glob("PB orders*.xlsb", root_dir=path):
    full_path = os.path.join(path, filename)
    print(full_path )                         
    month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")

Avoiding temp files

You still need to check the file name to avoid the temporary files generated when someone opens an Excel file (the files that start with ~) :

for filename in glob.glob("PB orders*.xlsb", root_dir=path):
    if not os.path.basename(filename).startswith("~"):
        full_path = os.path.join(path, filename)
            print(full_path )                         
            month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • Thanks it works perfectly, but how should I change the last part where it converts to CSV? – Greencolor Feb 04 '22 at 09:54
  • basically i want to keep the original name ( PB orders December) but the extension should be csv – Greencolor Feb 04 '22 at 09:55
  • @Greencolor there are several answers for this, eg [in this question](https://stackoverflow.com/questions/3548673/how-can-i-replace-or-strip-an-extension-from-a-filename-in-python). Most of the answers work if the filename contains only one dot. pathlib's `with_suffix` can be used to change the extension no matter what the rest of the filename contains – Panagiotis Kanavos Feb 04 '22 at 10:04
  • is not it possible to do with `df.to_csv`? – Greencolor Feb 04 '22 at 10:17
1

When you do month = pd.read_excel(filename, sheet_name="Raw data ", engine="pyxlsb") try to replace it with this:

month = pd.read_excel(path + filename, sheet_name="Raw data ", engine="pyxlsb")

This will prepend the path to your filenames in the given directory.

neisor
  • 384
  • 4
  • 15