1

How to merge all csv files in a specific folder using os.

So the code bellow does the work but it concat all the files in the same directory where the script lives.

How do utilized it in different folders?

My code:

import os
import pandas as pd
import numpy as np



def get_df():
    df=pd.DataFrame()
    for file in os.listdir():
        if file.endswith('.csv'):
            aux=pd.read_csv(file, error_bad_lines=False)
            df=df.append(aux)
    return df


df=get_df()


df.to_csv(f"file_name.csv")

I have tried adding folder name, but doesn't find the files.

for file in os.listdir('My_folder_name\'):

Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83
Peter
  • 544
  • 5
  • 20
  • Does it have to be in python? Or would bash work: `cat *.csv > all_files.csv` – JD Frias Jul 09 '20 at 23:30
  • @JD Fries, needs to be python because all the code base is in a jupyter notebook, merging the files is the icing on the cake – Peter Jul 09 '20 at 23:32
  • Your script is probably not running in the directory where your files are, try to print out `os.getcwd()` to see where it is running from, or use a full path in `listdir()` – JD Frias Jul 09 '20 at 23:36
  • @JDFrias, exactly. so the code works if the files are in the same directory. So the goal is access files in different directory/folder – Peter Jul 09 '20 at 23:54
  • I see you're using a back-slash in your example, probably a window path problem? Try `os.path.join()` or `pathlib` to build the path, or use forward-slash `/` not the best way though. – JD Frias Jul 10 '20 at 00:03
  • @JDFrias, How should I use ```os.path.join()``` not sure what line of the code I should replace. I tries replacing like ```file in os.path.join('folder_name'):``` it returns a blank file – Peter Jul 10 '20 at 00:35
  • is this a windows or linux machine? – marphlap Jul 10 '20 at 01:45
  • @JDFrias Windowns 10, I am coding python 3.7 at jupyter notebook – Peter Jul 10 '20 at 13:40

2 Answers2

1

I found at os module documentation a code that changes the directory os.chdir("C:\\Users\\Desktop\\my_folder_name\\my_new_folder_name").

https://www.tutorialsteacher.com/python/os-module

So I just add this line before the loop and now it works!!

def get_df():
    df=pd.DataFrame()
    os.chdir("C:\\Users\\Desktop\\my_folder_name\\my_new_folder_name")
    for file in os.listdir():
        if file.endswith('.csv'):
            aux=pd.read_csv(file, error_bad_lines=False)
            df=df.append(aux)
    return df


df=get_df()

df.to_csv(f"file_name.csv")
Peter
  • 544
  • 5
  • 20
1

There are various ways to solve it, depending on the type of merge one wants to do.

Considering your specific requirements (Python and os), and assuming one wants to concat the files, the following will do the work (including for files with the same header)

import os

os.system("awk '(NR == 1) || (FNR > 1)' file*.csv > merged.csv")

Where NR and FNR represent the number of the line being processed.

FNR is the current line within each file.

NR == 1 includes the first line of the first file (the header), while (FNR > 1) skips the first line of each subsequent file.

Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83