3

I am trying to combine multiple csv files into one, and have tried a number of methods but I am struggling.

I import the data from multiple csv files, and when I compile them together into one csv file, it seems that the first few rows get filled out nicely, but then it starts randomly inputting spaces of variable number in between the rows, and it never finishes filling out the combined csv file, it just seems to continuously get information added to it, which does not make sense to me because I am trying to compile a finite amount of data.

I have already tried writing close statements for the file, and I still get the same result, my designated combined csv file never stops getting data, and it will randomly space the data throughout the file - I just want a normally compiled csv.

Is there an error in my code? Is there any explanation as to why my csv file is behaving this way?

csv_file_list = glob.glob(Dir + '/*.csv') #returns the file list
print (csv_file_list)
with open(Avg_Dir + '.csv','w') as f:
    wf = csv.writer(f, delimiter = ',')
    print (f)
    for files in csv_file_list:
        rd = csv.reader(open(files,'r'),delimiter = ',')
        for row in rd:
            print (row)
            wf.writerow(row)
Nqsir
  • 829
  • 11
  • 19
  • Are you sure the csv files you're trying to combine don't have empty space at the end of them? Also if you have a lot of files with a lot of lines maybe it's just taking a long time to run, which is why it seems like it never stops getting data. – iamchoosinganame May 24 '19 at 14:45
  • @iamchoosinganame the files don't have empty space at the end, and the combined file should be about 46,000 KB, however when I run the program is sometimes even hits 5,000,000 KB before I terminate the program because I know something is wrong – Juliette van Heerden May 24 '19 at 15:06

2 Answers2

3

Your code works for me.

Alternatively, you can merge files as follows:

csv_file_list = glob.glob(Dir + '/*.csv')
with open(Avg_Dir + '.csv','w') as wf:
    for file in csv_file_list:
        with open(file) as rf:
            for line in rf:
                if line.strip(): # if line is not empty
                    if not line.endswith("\n"):
                        line+="\n"
                    wf.write(line)

Or, if the files are not too large, you can read each file at once. But in this case all empty lines an headers will be copied:

csv_file_list = glob.glob(Dir + '/*.csv')
with open(Avg_Dir + '.csv','w') as wf:
    for file in csv_file_list:
        with open(file) as rf:
            wf.write(rf.read().strip()+"\n")
Aray Karjauv
  • 2,679
  • 2
  • 26
  • 44
  • This resolves the issue I had with the spacing, thank you! Now the data is all nicely formatted together - but one issue still remains: it never seems to finish filling the combined csv file: it is only combined supposed to be 46,000 kilobytes, but it never stops growing as a file, so I'm confused. Do you happen to have any idea of why this is? – Juliette van Heerden May 24 '19 at 15:18
  • how do you limit the combined file? without your data it is quite complicated to answer your question – Aray Karjauv May 24 '19 at 15:23
  • This is the only block of code, there are no other processes you can't see other than what I link to Dir and Avg_Dir. Dir is linked to a file containing mutliple csv files (that's it) and Avg_Dir links to a file with a single csv file that is being used as the csv file with all of the combined data @Parfait – Juliette van Heerden May 24 '19 at 15:40
  • ^i also responded to your question i think @Aray – Juliette van Heerden May 24 '19 at 15:40
  • Next question: *how* are you running the Python script? Through an IDE, at command line, web notebook? Try running at command line to avoid environment issues: `python myscript.py` or `"C:\path\to\bin\python.exe" "C:\path\to\myscript.py"`. – Parfait May 24 '19 at 15:45
  • The only idea I have is you did not reach the end of your data. Just try to count lines of all your files manually and compare it in the code (create counter) – Aray Karjauv May 24 '19 at 15:46
  • @Parfait I am running it through Pycharm – Juliette van Heerden May 24 '19 at 15:52
  • You also can read each file at once. I've updated my answer. If it helps don't forget to accept the answer. Cheers! – Aray Karjauv May 24 '19 at 15:59
  • If you're combining csv files with the same header, use the first approach to enumerate over `csv_file_list` with `for i, file in enumerate(csv_file_list): ...` and skip the header after the first csv file in the list by putting `if i != 0: next(rf)` before the second loop. – hlzl Apr 14 '21 at 10:01
2

Consider several adjustments:

  1. Use context manager, with, for both the read and write process. This avoids the need to close() file objects which you do not do on the read objects.
  2. For skipping lines issue: use either the argument newline='' in open() or lineterminator="\n" argument in csv.writer(). See SO answers for former and latter.
  3. Use os.path.join() to properly concatenate folder and file paths. This method is os-agnostic so accounts for Windows or Unix machines using forward or backslashes types.

Adjusted script:

import os
import csv, glob

Dir = r"C:\Path\To\Source"
Avg_Dir = r"C:\Path\To\Destination\Output"

csv_file_list = glob.glob(os.path.join(Dir, '*.csv')) # returns the file list
print (csv_file_list)

with open(os.path.join(Avg_Dir, 'Output.csv'), 'w', newline='') as f:
    wf = csv.writer(f, lineterminator='\n')

    for files in csv_file_list:
        with open(files, 'r') as r: 
            next(r)                   # SKIP HEADERS
            rr = csv.reader(r)
            for row in rr:
                wf.writerow(row)
Parfait
  • 104,375
  • 17
  • 94
  • 125