0

I am trying to combine over 100,000 CSV files (all same formats) in a folder using below script. Each CSV file is on average 3-6KB of size. When I run this script, it only opens exact 47 .csv files and combines. When I re-run it only combines same .csv files, not all of them. I don't understand why it is doing that?

import os
import glob

os.chdir("D:\Users\Bop\csv")    

want_header = True
out_filename = "combined.files.csv"          

if os.path.exists(out_filename):
    os.remove(out_filename)

read_files = glob.glob("*.csv")

with open(out_filename, "w") as outfile:
    for filename in read_files:
        with open(filename) as infile:
            if want_header:
                outfile.write('{},Filename\n'.format(next(infile).strip()))
                want_header = False
            else:
                next(infile)
            for line in infile:
                outfile.write('{},{}\n'.format(line.strip(), filename))
Cameroon P
  • 119
  • 3
  • 10
  • 2
    Does `read_files` actually contain all 100000 files? (which is an awful lot of files in the same directory.) –  Apr 14 '16 at 05:37
  • You don't need to check for and remove the output file first, if you open it with [mode `"w"`](https://docs.python.org/3.5/library/functions.html#open). –  Apr 14 '16 at 05:39
  • do some filenames start with a dot '.'? Is there some where the 'csv' extension is in a different case? I would just check (even running in the python CLI) what the result of the glob.glob() function is and go from there.. – Matthieu Apr 14 '16 at 05:42
  • How can I check read_files actually contains all the files? Files are named like this: "file0000001, file000002, etc..." –  Cameroon P Apr 14 '16 at 10:02

2 Answers2

0

Firstly check the length of read_files:

read_files = glob.glob("*.csv")
print(len(read_files))

Note that glob isn't necessarily recursive as described in this SO question.

Otherwise your code looks fine. You may want to consider using the CSV library but note that you need to adjust the field size limit with really large files.

Community
  • 1
  • 1
dodell
  • 470
  • 3
  • 7
  • As the "combined file" gets larger combining all those 100,000 csv files, it will get hundreds of megabite in size. Will that matter? –  Cameroon P Apr 14 '16 at 11:12
0

Are you shure your all filenames ends with .csv? If all files in this directory contains what you need, then open all of them without filtering.

glob.glob('*') 
dey
  • 3,022
  • 1
  • 15
  • 25