0

I'm pretty new to Python so bear with me. I have a list of 30+ .xlsx files that I want to be read into a list of data frames. After that, I want them to be concatenated into a single data frame.

I have a function that creates a list of .xlsx files from a given directory:

from os import listdir

working_directory = 'N:/path to files'

def find_xlsx_filenames(path_to_dir, suffix=".xlsx"):
        """
        The function find_xlsx_filenames() returns a list of filenames as strings, that reside in the directory path_to_dir with the 
        given suffix (by default,".xlsx").
        """
        filenames = listdir(path_to_dir)
        return[filename for filename in filenames if filename.endswith(suffix)]
    
# Creates a list of all validation xlsx files in the folder:
filenames = find_xlsx_filenames(working_directory) 
for file_list in filenames:
    print(file_list)

And this is what the output looks like:

enter image description here

I want to take this list of files and apply a function that reads them into separate data frames, then combines them into a single data frame. Thank you!

miquiztli_
  • 131
  • 1
  • 8

1 Answers1

1

You can add all parameters you need to pd.read_excel:

import glob
import os

data = [pd.read_excel(filename)
            for filename in glob.glob(os.path.join(working_directory, "*.xlsx")]
df = pd.concat(data).reset_index(drop=True)

If your files are large, I have a multiprocessing version here

Edit: implement the code by defining a function?

In this version, I do not use glob anymore. You are responsible to set the file_path directory and the list of filenames returned by your function find_xlsx_filenames.

import os

def excel_to_df(file_path, filenames):
    data = [pd.read_excel(os.path.join(file_path, filename))
                for filename in filenames]
    return pd.concat(data).reset_index(drop=True)
Corralien
  • 109,409
  • 8
  • 28
  • 52