2

I have a dataset of about 10 CSV files. I want to combine those files row-wise into a single CSV file.

What I tried:

import csv
fout = open("claaassA.csv","a")
# first file:
writer = csv.writer(fout)
for line in open("a01.ihr.60.ann.csv"):
     print line
     writer.writerow(line)
# now the rest:    
for num in range(2, 10):
    print num
    f = open("a0"+str(num)+".ihr.60.ann.csv")
#f.next() # skip the header
for line in f:
     print line
     writer.writerow(line)
#f.close() # not really needed
fout.close()
Tonechas
  • 13,398
  • 16
  • 46
  • 80
Dhara
  • 282
  • 1
  • 4
  • 19

2 Answers2

3

Definitively need more details in the question (ideally examples of the inputs and expected output).

Given the little information provided, I will assume that you know that all files are valid CSV and they all have the same number or lines (rows). I'll also assume that memory is not a concern (i.e. they are "small" files that fit together in memory). Furthermore, I assume that line endings are new line (\n).

If all these assumptions are valid, then you can do something like this:

input_files = ['file1.csv', 'file2.csv', 'file3.csv']
output_file = 'output.csv'

output = None
for infile in input_files:
    with open(infile, 'r') as fh:
        if output:
            for i, l in enumerate(fh.readlines()):
                output[i] = "{},{}".format(output[i].rstrip('\n'), l)
        else:
            output = fh.readlines()

with open(output_file, 'w') as fh:
    for line in output:
        fh.write(line) 

There are probably more efficient ways, but this is a quick and dirty way to achieve what I think you are asking for.


The previous answer implicitly assumes we need to do this in python. If bash is an option then you could use the paste command. For example:

paste -d, file1.csv file2.csv file3.csv > output.csv
jorgeh
  • 1,727
  • 20
  • 32
0

I don't understand fully why you use the library csv. Actually, it's enough to fill the output file with the lines from given files (it they have the same columns' manes and orders).

input_path_list = [
    "a01.ihr.60.ann.csv",
    "a02.ihr.60.ann.csv",
    "a03.ihr.60.ann.csv",
    "a04.ihr.60.ann.csv",
    "a05.ihr.60.ann.csv",
    "a06.ihr.60.ann.csv",
    "a07.ihr.60.ann.csv",
    "a08.ihr.60.ann.csv",
    "a09.ihr.60.ann.csv",
]
output_path = "claaassA.csv"

with open(output_path, "w") as fout:
    header_written = False

    for intput_path in input_path_list:
        with open(intput_path) as fin:
            header = fin.next()

            # it adds the header at the beginning and skips other headers
            if not header_written:
                fout.write(header)
                header_written = True

            # it adds all rows
            for line in fin:
                fout.write(line)
Fomalhaut
  • 8,590
  • 8
  • 51
  • 95