3

I am trying to concatenate CSV files from a folder in my desktop:

C:\\Users\\Vincentc\\Desktop\\W1 

and output the final CSV to:

C:\\Users\\Vincentc\\Desktop\\W2\\conca.csv

The CSV files don't have header. However, nothing come out when I run my script, and no error message. I'm a beginner, can someone have a look at my code below, Thanks a lot!

import os
import glob
import pandas

def concatenate(indir="C:\\Users\\Vincentc\\Desktop\\W1",outfile="C:\\Users\\Vincentc\\Desktop\\W2\\conca.csv"):
    os.chdir(indir)
    fileList=glob.glob("indir")
    dfList=[]
    for filename in fileList:
        print(filename)
        df=pandas.read_csv(filename,header=None)
        dfList.append(df)
    concaDf=pandas.concat(dfList,axis=0)
    concaDf.to_csv(outfile,index=None)
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
VinC
  • 31
  • 1
  • 3
  • 1
    you are not calling the `indir` variable, but a string `fileList=glob.glob(indir)` – PRMoureu May 16 '18 at 06:07
  • Thank you! I changed the "indir" back to fileList=glob.glob(indir), but still, when I do print(filename) , no filename output. – VinC May 16 '18 at 06:21
  • Try adding `\\*` to the end of `indir` – Martin Evans May 16 '18 at 11:42
  • Did one of the answers below help? If so, consider [accepting](https://stackoverflow.com/help/someone-answers) (green tick on left), or ask for clarification. – jpp May 24 '18 at 23:06

2 Answers2

3

Loading csv files into pandas only for concatenation purposes is inefficient. See this answer for a more direct alternative.

If you insist on using pandas, the 3rd party library dask provides an intuitive interface:

import dask.dataframe as dd

df = dd.read_csv('*.csv')  # read all csv files in directory lazily
df.compute().to_csv('out.csv', index=False)  # convert to pandas and save as csv
jpp
  • 159,742
  • 34
  • 281
  • 339
  • I'm getting this error: `ValueError: Length mismatch: Expected axis has 4 elements, new values have 3 elements` – 大朱雀 Feb 23 '21 at 13:27
1

glob.glob() needs a wildcard to match all the files in the folder you have given. Without it, you might just get the folder name returned, and none of the files inside it. Try the following:

import os
import glob
import pandas

def concatenate(indir=r"C:\Users\Vincentc\Desktop\W1\*", outfile=r"C:\Users\Vincentc\Desktop\W2\conca.csv"):
    os.chdir(indir)
    fileList = glob.glob(indir)
    dfList = []

    for filename in fileList:
        print(filename)
        df = pandas.read_csv(filename, header=None)
        dfList.append(df)

    concaDf = pandas.concat(dfList, axis=0)
    concaDf.to_csv(outfile, index=None)

Also you can avoid the need for adding \\ by either using / or by prefixing the strings with r. This has the effect of disabling the backslash escaping on the string.

Martin Evans
  • 45,791
  • 17
  • 81
  • 97