1

I have a folder of numerous .xslx files on my desktop and I am trying to iterate through them one by one to collect the respective sheet names automatically from each workbook.

import openpyxl
import glob


# specifying the path to csv files

path = "C:/Users/X/Desktop/Test"
  
# csv files in the path

files = glob.glob(path + "/*.xlsx")

sheet_names = []

for x in files:
    openpyxl.load_workbook(files)
    sheet_names.append(files.sheetnames)

I am getting an error code:

TypeError: expected str, bytes or os.PathLike object, not list

Is there any way to do this iteratively versus one by one if I have all of the workbook names in a list?

Thank you.

I am looking for the sheet names in each respective Excel workbook file.

Jessica
  • 23
  • 2

2 Answers2

2

Try:

sheet_names = []

for x in files:
    sheet_names.append(openpyxl.load_workbook(x).sheetnames)

Or, using a list comprehension:

sheet_names = [openpyxl.load_workbook(x).sheetnames for x in files]

Update:

If you encounter an error message like:

BadZipFile: File is not a zip file

This probably means that the file you are trying to access is corrupted. I.e. openpyxl is simply unable to open the file, and the remedy to this situation, i.e. fixing the file, I mean, lies outside the code; you will have to take a closer look at that particular file.

However, you can use try ... except to avoid having such an event interrupt the rest of your procedure. E.g.:

sheet_names = []

for x in files:
    try:
        sheet_names.append(openpyxl.load_workbook(x).sheetnames)
    except Exception as e:
            print(f'Unable to process {x}. Error raised', end=' ')
            print(e)

E.g. whenever the code encounters a BadZipFile, now you'll see a print: Unable to process {path_to/some_file.xlsx}. Error raised File is not a zip file.


Explanation error

The error occurs because you are referencing files (i.e. a list), rather than the items inside the list (i.e. x) in this line:

    openpyxl.load_workbook(files) # we need `x` here instead of `files`

Next, in order to actually use the result from the function call, you should have assigned it to a variable, e.g. wb = openpyxl.load_workbook(x) and then pass it to the list.append function. I.e.:

    sheet_names.append(files.sheetnames) # we would need `wb.sheetnames` here

However, as shown above, you can just put all of this is a one-liner (so, skipping the intermediate assignment to wb).

ouroboros1
  • 9,113
  • 3
  • 7
  • 26
  • Thank you for the assistance. I am getting an error where my files do not have the full path appended. I would need to tack on "C:/Users/X/Desktop/" to every string element in the list for this to function. Thank you a million. – Jessica Oct 30 '22 at 16:20
  • Do you mean that your list `files` contains entries without the preceding `path` (i.e. `"C:/Users/X/Desktop/Test"`)? That would seem very curious to me giving your code. – ouroboros1 Oct 30 '22 at 16:24
  • I had the names of the Excel files in my list *files*, but now I have appended the full path to the beginning of each element x in *files*, that is all cleaned up. Now, the issue I am having is: "error code: BadZipFile: File is not a zip file" for the code provided. Thanks for the help. – Jessica Oct 30 '22 at 16:29
  • Updated the answer to handle the error in question (and potential others). – ouroboros1 Oct 30 '22 at 16:48
0

See: Pandas: Looking up the list of sheets in an excel file:

import pandas as pd
import os

excelfolder = r'/home/bera/Downloads/'

for root, folder, files in os.walk(excelfolder):
    for file in files:
        if file.endswith('.xlsx'):
            print(file)
            fullname = os.path.join(root, file)
            print(pd.ExcelFile(fullname).sheet_names)
            
            #Excelfile123.xlsx
            #['Sheet1', 'Sheet2', 'Sheet3']
            
            #excelfile2.xlsx
            #['A', 'B']
            
BERA
  • 1,345
  • 3
  • 16
  • 36
  • This functioned perfectly for some time, I collected about 50 excel file names with their respective sheet names. I have over 200+ excel files in this folder, is there a limit to the number of excel files? I got this error message: "XLRDError: Can't find workbook in OLE2 compound document". – Jessica Oct 30 '22 at 16:33
  • I dont know if there's a limit. I just tried it on a folder with two files in it. Can it be this problem?: https://stackoverflow.com/questions/45725645/pandas-unable-to-open-this-excel-file – BERA Oct 30 '22 at 16:54
  • 1
    Hey, I located the two files that were giving issues. I will test the protections and another article I saw mentioned that there was a version issue too. Thanks for teaching me this code, I will be using it quite a bit. I was able to extract the rest of the files and create a dictionary with filename:sheetnames being the key:value pairs. – Jessica Oct 30 '22 at 17:10