0

I'm writing a script in Python. I have a bunch of csv files that each contain 1 column. These are what files might look like this:

FirstFile.csv

First
a
b
c

SecondFile.csv

Second
a2
b2
c2

I want some resultant file (let's call it result.csv) to be created that looks like:

First    Second
a        a2
b        b2
c        c2

How can I append all the csv's in a directory in python and append all the columns so I have a result.csv that looks like this (but, of course, with many more columns)?

yalpsid eman
  • 3,064
  • 6
  • 45
  • 71
  • 1
    Have you used pandas before? – mad_ Oct 23 '18 at 18:32
  • 2
    Have you tried anything at all? – roganjosh Oct 23 '18 at 18:33
  • 2
    If you do not have size problems (like million of rows per files or very big values per columns) ... simply load all and write them in a new file. Look at the module `csv` for ease of writing. Or use pandas. – Patrick Artner Oct 23 '18 at 18:33
  • 2
    @mad_ there is no reason to default back onto pandas for this when it can be done with the `csv` module and lists alone just as fast – roganjosh Oct 23 '18 at 18:34
  • @roganjosh are you monster? why u don't love pandas. they're cute! – deadvoid Oct 23 '18 at 18:41
  • @cryptonome absolutely I am. A monster with a pet hate of people recommending a big dependency for reading CSV files and then making no further use of it. – roganjosh Oct 23 '18 at 18:43
  • @roganjosh well i guess they got paid for recommending it – deadvoid Oct 23 '18 at 18:44
  • @roganjosh Yes it can be achieved by csv module as well but with pandas you may get code readability and elegance with less amount of code. Performance is not everything – mad_ Oct 23 '18 at 18:49
  • @mad_ readability for people that know pandas. If the question lacks that tag, I do really dislike suggestions to install another library. The highest voted answer currently is unusable if another reader can't install the library. It's fine to add as an extension to the answer, but not on its own. – roganjosh Oct 23 '18 at 18:53

4 Answers4

3

You can try using Pandas.

import pandas as pd
result = pd.concat([ pd.read_csv(f) for f in filenames ],axis=1)
result.to_csv("result.csv",index=False)
  1. Create a list of your file names (e.g. filenames)
  2. Import Pandas
  3. Use the concat function with list comprehension
1

You can use the csv module:

Create 10 files:

filenames = []
for i in range(10):
    filenames.append(f"file_{i}.txt")
    with open(filenames[-1],"w") as f:
        f.write(f"Header{i}\n")
        for row in range(5):
            f.write(f"text_{i}_{row}\n")

Read in all files:

data = []
for f in filenames:       # filled when creating files, you can use os.walk to fill yours
    with open(f) as r:
        data.append([x.strip() for x in r])

# data is a list of columns, we need a list of list of columns, so we transpose the data:
transpose = zip(*data)

# write the joined file
import csv
with open("joined.txt","w", newline="") as j:
    w = csv.writer(j)
    w.writerows(transpose)

Check if it is ok:

with open("joined.txt") as j:
    print(j.read())

Output:

Header0,Header1,Header2,Header3,Header4,Header5,Header6,Header7,Header8,Header9
text_0_0,text_1_0,text_2_0,text_3_0,text_4_0,text_5_0,text_6_0,text_7_0,text_8_0,text_9_0
text_0_1,text_1_1,text_2_1,text_3_1,text_4_1,text_5_1,text_6_1,text_7_1,text_8_1,text_9_1
text_0_2,text_1_2,text_2_2,text_3_2,text_4_2,text_5_2,text_6_2,text_7_2,text_8_2,text_9_2
text_0_3,text_1_3,text_2_3,text_3_3,text_4_3,text_5_3,text_6_3,text_7_3,text_8_3,text_9_3
text_0_4,text_1_4,text_2_4,text_3_4,text_4_4,text_5_4,text_6_4,text_7_4,text_8_4,text_9_4

data looks like this:

[['Header0', 'text_0_0', 'text_0_1', 'text_0_2', 'text_0_3', 'text_0_4'], # one files data
 ['Header1', 'text_1_0', 'text_1_1', 'text_1_2', 'text_1_3', 'text_1_4'], 
 ['Header2', 'text_2_0', 'text_2_1', 'text_2_2', 'text_2_3', 'text_2_4'], 
 ['Header3', 'text_3_0', 'text_3_1', 'text_3_2', 'text_3_3', 'text_3_4'], 
 ['Header4', 'text_4_0', 'text_4_1', 'text_4_2', 'text_4_3', 'text_4_4'], 
 ['Header5', 'text_5_0', 'text_5_1', 'text_5_2', 'text_5_3', 'text_5_4'], 
 ['Header6', 'text_6_0', 'text_6_1', 'text_6_2', 'text_6_3', 'text_6_4'], 
 ['Header7', 'text_7_0', 'text_7_1', 'text_7_2', 'text_7_3', 'text_7_4'], 
 ['Header8', 'text_8_0', 'text_8_1', 'text_8_2', 'text_8_3', 'text_8_4'], 
 ['Header9', 'text_9_0', 'text_9_1', 'text_9_2', 'text_9_3', 'text_9_4']]

Transposed it looks like:

[('Header0', 'Header1', 'Header2', 'Header3', 'Header4', 'Header5', 'Header6', 'Header7', 'Header8', 'Header9'), 
 ('text_0_0', 'text_1_0', 'text_2_0', 'text_3_0', 'text_4_0', 'text_5_0', 'text_6_0', 'text_7_0', 'text_8_0', 'text_9_0'), 
 ('text_0_1', 'text_1_1', 'text_2_1', 'text_3_1', 'text_4_1', 'text_5_1', 'text_6_1', 'text_7_1', 'text_8_1', 'text_9_1'), 
 ('text_0_2', 'text_1_2', 'text_2_2', 'text_3_2', 'text_4_2', 'text_5_2', 'text_6_2', 'text_7_2', 'text_8_2', 'text_9_2'), 
 ('text_0_3', 'text_1_3', 'text_2_3', 'text_3_3', 'text_4_3', 'text_5_3', 'text_6_3', 'text_7_3', 'text_8_3', 'text_9_3'), 
 ('text_0_4', 'text_1_4', 'text_2_4', 'text_3_4', 'text_4_4', 'text_5_4', 'text_6_4', 'text_7_4', 'text_8_4', 'text_9_4')]
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • I'm doing this with csv's, and the transposition (transpose = zip(*data)) doesn't seem to work. – yalpsid eman Oct 23 '18 at 22:23
  • @yalpsideman It works in the code I posted. zip() produces a generator that can only be consumed once - if you print it first its empty afterwards, when you want to write the file. Use `transpose = list( zip(*data))` to create a list from the transposed values so you can use them multiple timesfor printing and file writing – Patrick Artner Oct 23 '18 at 22:26
0

I'm sure there are more pythonic ways, but this will work (so long as all files have an identical number of lines).

input_files = ['FirstFile.csv', 'SecondFile.csv']
csv_separator = '\t'

data = []

for file in input_files:
    partial_data = []
    with open(file, 'r') as f:
        for line in f:
            partial_data.append(line.strip('\n'))
        data.append(partial_data)

with open('output.csv','w') as output:
    for item in range(len(data[0])):
        line = []
        for part in range(len(data)):
            line.append(data[part][item])
        output.write(csv_separator.join(line)+'\n')
0

If you're looking for a pure python solution it's probably best to csv.DictReader and csv.DictWriter so you have more control over how the data is formatted. Also, everything is 'generated' on the fly so it will be more memory efficient with very large files.

import csv

with open('csv1.csv') as csv1, open('csv2.csv') as csv2:
    r1 = csv.DictReader(csv1)
    r2 = csv.DictReader(csv2)
    with open('csv3.csv', 'w') as csv3:
        writer = csv.DictWriter(csv3, 
            fieldnames=["First", "Second"],
            lineterminator='\n'
        )
        writer.writeheader()
        writer.writerows({**x, **y} for x, y in zip(r1, r2))
nicholishen
  • 2,602
  • 2
  • 9
  • 13