0

What I am essentially looking for is the `paste' command in bash, but in Python2. Suppose I have a csv file:

a1,b1,c1,d1
a2,b2,c2,d2
a3,b3,c3,d3

And another such:

e1,f1
e2,f2
e3,f3

I want to pull them together into this:

a1,b1,c1,d1,e1,f1
a2,b2,c2,d2,e2,f2
a3,b3,c3,d3,e3,f3

This is the simplest case where I have a known number and only two. What if I wanted to do this with an arbitrary number of files without knowing how many I have.

I am thinking along the lines of using zip with a list of csv.reader iterables. There will be some unpacking involved but seems like this much python-foo is above my IQ level ATM. Can someone suggest how to implement this idea or something completely different?

I suspect this should be doable with a short snippet. Thanks.

asb
  • 4,392
  • 1
  • 20
  • 30

3 Answers3

2
file1 = open("file1.csv", "r")
file2 = open("file2.csv", "r")

for line in file1:
    print(line.strip().strip(",") +","+ file2.readline().strip()+"\n")

Extendable for as many files as you wish. Just keep adding to the print statement. Instead of print you can also have a append to a list or whatever you wish. You may have to worry about length of files, I did not as you did not specify.

1478963
  • 1,198
  • 4
  • 11
  • 25
  • 1
    You will want to add a comma to this, or check for commas in the first file. You could use `line.strip().strip(',') + ',' + file2.readline()` – John Haberstroh May 15 '14 at 01:31
1

Assuming the number of files is unknown, and that all the files are properly formatted as csv have the same number of lines:

files = ['csv1', 'csv2', 'csv3']
fs = map(open, files)

done = False

while not done:
    chunks = []
    for f in fs:
        try:
            l = next(f).strip()
            chunks.append(l)
        except StopIteration:
            done = True
            break
    if not done:
        print ','.join(chunks)

for f in fs:
    f.close()

There seems to be no easy way of using context managers with a variable list of files easily, at least in Python 2 (see a comment in the accepted answer here), so manual closing of files will be required as above.

Community
  • 1
  • 1
YS-L
  • 14,358
  • 3
  • 47
  • 58
  • Yep, this is useful. +1. I am just gonna see if there are any other answers and if not, accept yours. Thanks a lot. – asb May 15 '14 at 01:58
0

You could try pandas

In your case, group of [a,b,c,d] and [e,f] could be treated as DataFrame in Pandas, and it's easy to do join because Pandas has function called concat.

import pandas as pd

# define group [a-d] as df1
df1 = pd.read_csv('1.csv')
# define group [e-f] as df2
df2 = pd.read_csv('2.csv')

pd.concat(df1,df2,axis=1)
linpingta
  • 2,324
  • 2
  • 18
  • 36