1

I need to import all excel files in my directory, including sub directories, but I keep getting into an error message because there are other types of files in the same directory such as pdf and word. How do I my current code below to ignore all other types of files and only import excel files wit the xlsx extension.

for subdir, dirs, files in os.walk(data_path):
    for file in files:
            df_file = pd.read_excel(subdir + '/' +file)
            df_file.set_index(df_file.columns[0], inplace=True)
        
            df_total = pd.concat([df_file, df_total], ignore_index=True)

I tried something like if file.endswith("xlsx"): but it didn't work.

Thank you in advance.

  • What do you mean by didn't work? What was the error while using `if file.endswith(".xlsx"): ` ? – susenj Oct 28 '20 at 05:18
  • @susenj I got this message:Unsupported format, or corrupt file: Expected BOF record; found b'\rAdminis' –  Oct 28 '20 at 05:45
  • this problem is answered already, there is nothing wrong with your code, it seems. Please check: https://stackoverflow.com/questions/16504975/error-unsupported-format-or-corrupt-file-expected-bof-record – susenj Oct 28 '20 at 06:26

2 Answers2

2

The problem you are struggling with has already an answer here.

You don't even have to use any additional module you just need the path of the directory where the excel files are and you're done.

Ayush
  • 457
  • 8
  • 16
0

You can use an if statement to check if it endswith .xlsx (I know you said you've tried it but this one would work):

for subdir, dirs, files in os.walk(data_path):
    for file in files:
        if file.endswith(".xlsx"): 
            df_file = pd.read_excel(subdir + '/' +file)
            df_file.set_index(df_file.columns[0], inplace=True)
        
            df_total = pd.concat([df_file, df_total], ignore_index=True)
abhigyanj
  • 2,355
  • 2
  • 9
  • 29