2

I initially have a code that is working in merging files in one folder. However, the work expanded to merge files in two folders. I edited the code to add first the list of files in two folders. That part is working but will not work in the actual for loop part. I am thinking this could be because of the current working directory but I do not know how to change that part. Here is my code so far:

files1 = os.listdir(folder1) #i just replaced the path with folder1
files2 = os.listdir(folder2) #i just replaced the path with folder2
files = files1 + files2

df = pd.DataFrame()         #creating an empty dataframe
for f in files:             #for loop in extracting and merging the files
    data = pd.read_excel(f)  
    df = df.append(data, sort=False).reset_index(drop=True) 
ambmil
  • 115
  • 3
  • 9

3 Answers3

2

I suggest you change your code design a little bit. Make a list of all the folders and iterate on them and load the files in the loop. You can use this code for as many folders as you want

import os
from pathlib import Path

def merge_files_in_folder(folder):
    df = pd.DataFrame()

    for excel_file in os.listdir(folder):
        data = pd.read_excel(folder / excel_file)
        df = df.append(data, sort=False).reset_index(drop=True)

    return df


folders = ['folder1', 'folder2']
df = pd.DataFrame()

for folder in folders:
    data = merge_files_in_folder(Path(folder))
    df = df.append(data, sort=False).reset_index(drop=True) 
yondu_udanta
  • 797
  • 1
  • 6
  • 13
2

We can fix this code by adding folder names to file names:

import os
import pandas as pd

folder1 = 'folder1/'
folder2 = 'folder2/'

files1 = ["{}{}".format(folder1,file) for file in os.listdir(folder1)]
files2 = ["{}{}".format(folder2,file) for file in os.listdir(folder2)]
files = files1 + files2

print(files)

df = pd.DataFrame()         #creating an empty dataframe
for f in files:             #for loop in extracting and merging the files
    data = pd.read_excel(f)  
    df = df.append(data, sort=False).reset_index(drop=True) 

And move it to a helper function, like Alex suggested in order to make code better.

EDIT: Code with a helper function:

import os
import pandas as pd

def listdir_fullpath(d):
    return [os.path.join(d, f) for f in os.listdir(d)]

folder1 = 'folder1/'
folder2 = 'folder2/'

files1 = listdir_fullpath(folder1)
files2 = listdir_fullpath(folder2)
files = files1 + files2

print(files)

df = pd.DataFrame()         #creating an empty dataframe
for f in files:             #for loop in extracting and merging the files
    data = pd.read_excel(f)  
    df = df.append(data, sort=False).reset_index(drop=True) 
Stepan Novikov
  • 1,402
  • 12
  • 22
1

You're right: "os.listdir" only lists filenames of the files in defined folder (Python 3 Documentation).

Since it os.listdir returns a list, you could use a helper function (like this) to prepend the path to every list item.

Alex
  • 46
  • 4