Combining files in python

Question

I was wondering if someone would be able to help me in combining two files.

The first file looks like

AAAA

BBBB

CCCC

DDDD

EEEE

And the second is like

aaaa

bbbb

cccc

dddd

eeee

and I'm looking for something that ends up as

AAAAaaaa

BBBBbbbb

CCCCcccc

DDDDdddd

EEEEeeee

So far I can only copy the first file to the other, but it always ends up deleting what was originally contained in the file.

reading your last line, I'd suggest you open a third combined file instead of trying to write the first file to the second file. But anyway, you should really show us what you tried so we can help you fix it — yuvi, Nov 02 '13 at 18:00

score 2 · Answer 1 · edited May 23 '17 at 12:29

2

Here's an example that uses

for line in f and generators to make for efficient reading of the files
str.strip() to get rid of whitespace
the zip builtin to merge the two lists of lines
str.join() to join the final list of output lines with newlines.

combine.py

def read_lines(f):
    for line in f:
        if line.strip():
            yield line.strip()


def combine(lines):
    for (first, second) in lines:
        yield "%s%s\n" % (first, second)

lines1 = read_lines(open('first.txt'))
lines2 = read_lines(open('second.txt'))

lines = zip(lines1, lines2)

merged = '\n'.join(combine(lines))

with open('merged.txt', 'w') as outfile:
    outfile.write(merged)

This code doesn't assume that every line that matters is at an even line number, but instead checks if the line contains anything else than whitespace - if yes, it's being process, otherwise not.

edited May 23 '17 at 12:29

Community

1
1

answered Nov 02 '13 at 18:07

Lukas Graf

30,317
8
77
92

This is a workable solution. But it is best practice in Python to simply say `if line.strip():` rather than `if not line.strip() == '':`. Also, it is best practice to use a `with` statement along with the call to `open()` to ensure the file is always properly closed. Also, with a little bit of work, this could do all the processing one line at a time and not need to read in all the data... that doesn't matter for small input files but would be important for big data files. I definitely like the way `read_lines()` separates the logic of filtering the input away from the rest of the code. – steveha Nov 02 '13 at 18:31
@steveha I *do* use the `open()`context manager for the file I write to. For the files that are just being read, garbage collection will take care of them, so using a `with` statement would just clutter the code and make it less readable. – Lukas Graf Nov 02 '13 at 18:33
Yeah, this code still isn't optimized for memory usage - but doing that would make it much harder to read, and less instructive I though. As for `if line.strip()`: Point taked, will edit the answer. It's worth noting though that doesn't do exactly the same as `line.strip() == ''` - None would also cause that expression to be `False` (though of course `strip()` would never return `None`). – Lukas Graf Nov 02 '13 at 18:35
Garbage collection closes your files in CPython, but in PyPy or Jython or IronPython the files may not be closed in any timely fashion. For input files it probably doesn't matter that much, but I think the community agrees that it is best practice to use the `with` statement. – steveha Nov 02 '13 at 21:54

steveha · Answer 2 · 2013-11-02T23:29:45.703

0

This is Lokas Graf's answer, rewritten a little bit so that it just holds one line at a time from each input file, instead of reading in all the lines at once. It also uses with for the file I/O.

from itertools import izip

def read_lines(f):
    for line in f:
        s = line.strip()
        if s:
            yield s

def collect_input(fname0, fname1):
    # Multiple open() on one with only works in Python 2.7 or 3.1+.
    # For Python 2.5, 2.6, or 3.0 you can use two nested with statements
    # or use contextlib.nested().
    with open(fname0, "rt") as f0, open(fname1, "rt") as f1:
        for line0, line1 in izip(read_lines(f0), read_lines(f1)):
            yield "%s%s\n" % (line0.strip(), line1.strip())

with open('merged.txt', "wt") as f:
    for line in collect_input('first.txt', 'second.txt'):
        f.write(line)

edited Nov 02 '13 at 23:29

answered Nov 02 '13 at 21:53

steveha

74,789
21
92
117

You should probably note that this will only work in Python 2.7 / 3.x. In <2.7 you'd have to use two nested `with` statements (two on one line separated with comma isn't supported yet), adding yet another indentation level. That's another reason I didn't use the context manager for the input files in my answer. – Lukas Graf Nov 02 '13 at 22:08
Okay, I added a comment. I'm not sure how many people are still stuck on Python 2.5, 2.6, or 3.0 though, especially now that Cygwin finally got Python 2.7. I do really like the with statement and I want to encourage its use; I think the code looks good and is easy to understand, and it will work equally well in PyPy. The most important part is that this no longer reads the whole input files into memory, just one line from each input file at a time. For this learning program that doesn't matter, but it's good to learn to do it this way in case you ever need to handle giant files. – steveha Nov 02 '13 at 23:32

Combining files in python

2 Answers2