Combining multiple csv files into one csv file

Question

I am trying to combine multiple csv files into one, and have tried a number of methods but I am struggling.

I import the data from multiple csv files, and when I compile them together into one csv file, it seems that the first few rows get filled out nicely, but then it starts randomly inputting spaces of variable number in between the rows, and it never finishes filling out the combined csv file, it just seems to continuously get information added to it, which does not make sense to me because I am trying to compile a finite amount of data.

I have already tried writing close statements for the file, and I still get the same result, my designated combined csv file never stops getting data, and it will randomly space the data throughout the file - I just want a normally compiled csv.

Is there an error in my code? Is there any explanation as to why my csv file is behaving this way?

csv_file_list = glob.glob(Dir + '/*.csv') #returns the file list
print (csv_file_list)
with open(Avg_Dir + '.csv','w') as f:
    wf = csv.writer(f, delimiter = ',')
    print (f)
    for files in csv_file_list:
        rd = csv.reader(open(files,'r'),delimiter = ',')
        for row in rd:
            print (row)
            wf.writerow(row)

Are you sure the csv files you're trying to combine don't have empty space at the end of them? Also if you have a lot of files with a lot of lines maybe it's just taking a long time to run, which is why it seems like it never stops getting data. — iamchoosinganame, May 24 '19 at 14:45
@iamchoosinganame the files don't have empty space at the end, and the combined file should be about 46,000 KB, however when I run the program is sometimes even hits 5,000,000 KB before I terminate the program because I know something is wrong — Juliette van Heerden, May 24 '19 at 15:06

Aray Karjauv · Accepted Answer · 2019-05-24T15:58:03.553

3

Your code works for me.

Alternatively, you can merge files as follows:

csv_file_list = glob.glob(Dir + '/*.csv')
with open(Avg_Dir + '.csv','w') as wf:
    for file in csv_file_list:
        with open(file) as rf:
            for line in rf:
                if line.strip(): # if line is not empty
                    if not line.endswith("\n"):
                        line+="\n"
                    wf.write(line)

Or, if the files are not too large, you can read each file at once. But in this case all empty lines an headers will be copied:

csv_file_list = glob.glob(Dir + '/*.csv')
with open(Avg_Dir + '.csv','w') as wf:
    for file in csv_file_list:
        with open(file) as rf:
            wf.write(rf.read().strip()+"\n")

edited May 24 '19 at 15:58

answered May 24 '19 at 15:06

Aray Karjauv

2,679
2
26
44

This resolves the issue I had with the spacing, thank you! Now the data is all nicely formatted together - but one issue still remains: it never seems to finish filling the combined csv file: it is only combined supposed to be 46,000 kilobytes, but it never stops growing as a file, so I'm confused. Do you happen to have any idea of why this is? – Juliette van Heerden May 24 '19 at 15:18
how do you limit the combined file? without your data it is quite complicated to answer your question – Aray Karjauv May 24 '19 at 15:23
This is the only block of code, there are no other processes you can't see other than what I link to Dir and Avg_Dir. Dir is linked to a file containing mutliple csv files (that's it) and Avg_Dir links to a file with a single csv file that is being used as the csv file with all of the combined data @Parfait – Juliette van Heerden May 24 '19 at 15:40
^i also responded to your question i think @Aray – Juliette van Heerden May 24 '19 at 15:40
Next question: *how* are you running the Python script? Through an IDE, at command line, web notebook? Try running at command line to avoid environment issues: `python myscript.py` or `"C:\path\to\bin\python.exe" "C:\path\to\myscript.py"`. – Parfait May 24 '19 at 15:45
The only idea I have is you did not reach the end of your data. Just try to count lines of all your files manually and compare it in the code (create counter) – Aray Karjauv May 24 '19 at 15:46
@Parfait I am running it through Pycharm – Juliette van Heerden May 24 '19 at 15:52
You also can read each file at once. I've updated my answer. If it helps don't forget to accept the answer. Cheers! – Aray Karjauv May 24 '19 at 15:59
If you're combining csv files with the same header, use the first approach to enumerate over `csv_file_list` with `for i, file in enumerate(csv_file_list): ...` and skip the header after the first csv file in the list by putting `if i != 0: next(rf)` before the second loop. – hlzl Apr 14 '21 at 10:01

Parfait · Answer 2 · 2019-10-04T13:45:24.967

2

Consider several adjustments:

Use context manager, with, for both the read and write process. This avoids the need to close() file objects which you do not do on the read objects.
For skipping lines issue: use either the argument newline='' in open() or lineterminator="\n" argument in csv.writer(). See SO answers for former and latter.
Use os.path.join() to properly concatenate folder and file paths. This method is os-agnostic so accounts for Windows or Unix machines using forward or backslashes types.

Adjusted script:

import os
import csv, glob

Dir = r"C:\Path\To\Source"
Avg_Dir = r"C:\Path\To\Destination\Output"

csv_file_list = glob.glob(os.path.join(Dir, '*.csv')) # returns the file list
print (csv_file_list)

with open(os.path.join(Avg_Dir, 'Output.csv'), 'w', newline='') as f:
    wf = csv.writer(f, lineterminator='\n')

    for files in csv_file_list:
        with open(files, 'r') as r: 
            next(r)                   # SKIP HEADERS
            rr = csv.reader(r)
            for row in rr:
                wf.writerow(row)

edited Oct 04 '19 at 13:45

answered May 24 '19 at 15:36

Parfait

104,375
17
94
125

Thank you, this helps a lot with compiling the data and getting rid of headers - I still have an issue with how my combined csv file never stops adding data. I don't think this is an issue of the code - I think it might be an issue of how our csv files are saved to the computer, I am going to look into this. But thank you, your code definitely helped a lot – Juliette van Heerden May 24 '19 at 15:59
1

This solution works great on my end using 10 csv files of 50 rows to output 1 csv of 500 rows on a Windows machine. Hope you can find your issue. Good luck! – Parfait May 24 '19 at 16:04
Just ran the command and didn't get any output using 16 csv files with a total of 1,25 GBs on my Windows machine. – Tiago Martins Peres Oct 03 '19 at 20:26
Console output or the single CSV output? Did `print` output list of csv files? – Parfait Oct 03 '19 at 20:40
The single CSV output. The list was printed, yes – Tiago Martins Peres Oct 04 '19 at 07:21
Now the output appears, just not in the right folder. – Tiago Martins Peres Oct 04 '19 at 10:23
2

Simply add path in file name of `open`: `with open(os.path.join(mydestinationfolder, 'Output.csv'), 'w', newline='')`. – Parfait Oct 04 '19 at 13:45
Thank you @Parfait , like that is perfect! – Tiago Martins Peres Oct 16 '19 at 07:48

Combining multiple csv files into one csv file

2 Answers2

Linked