0

I have a huge data file (~2 G) that needs to be splitted into odd and even lines, processed separately and written into two files, I don't want to read the whole file into RAM, so I think a generator should be a suitable choice. In short I want do something like this:

lines = (l.strip() for l in open(inputfn))
oddlines = somefunction(getodds(lines))
evenlines = somefunction(getevens(lines))
outodds.write(oddlines)
outevens.write(evenlines)

Is this possible? Apparently indexing will not work:

In [75]: lines[::2]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/kaiyin/Phased/build37/chr22/segments/segment_1/<ipython-input-75-97be680d00e3> in <module>()
----> 1 lines[::2]

TypeError: 'generator' object is not subscriptable
qed
  • 22,298
  • 21
  • 125
  • 196

3 Answers3

2
def oddlines(fileobj):
    return (line for index,line in enumerate(fileobj) if index % 2)

def evenlines(fileobj):
    return (line for index,line in enumerate(fileobj) if not index % 2)

Note that this will require scanning the file twice, since these aren't designed to run in parallel. It does, however, lead to much less complex code. (Also note that an 'odd' line here is one with an index of 1,3,5 - which means that the first line is an 'even' line due to zero-indexing.)

As Ashwini notes, you could also use itertools.islice to do this.

Amber
  • 507,862
  • 82
  • 626
  • 550
  • This is nice, very easy to understand. Only the function names should be switched, since python index counts from 0. :D, thanks! – qed Aug 04 '13 at 20:51
2

Use itertools.islice to slice an iterator:

from itertools import islice
with open('filename') as f1, open('evens.txt', 'w') as f2:
    for line in islice(f1, 0, None, 2):
        f2.write(line)

with open('filename') as f1, open('odds.txt', 'w') as f2:
    for line in islice(f1, 1, None, 2):
        f2.write(line)
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
0

If you want to read the file just once, write a generator that wraps a file and returns a flag indicating whether the line is even or odd along with the actual line read from the file.

def oddeven(f, even=True):
    for line in f:
        yield even, line
        even = not even

Usage:

with open("infile.txt") as infile, \
     open("odd.txt", "w") as oddfile, \
     open ("even.txt", "w") as evenfile:
         for even, line in oddeven(infile):
            if even:
                evenfile.write(line)
            else:
                oddfile.write(line)

This can be further simplified by storing the output file objects in an indexable container:

with open("infile.txt") as infile, \
     open("odd.txt", "w") as oddfile, \
     open ("even.txt", "w") as evenfile:
         outfiles = (oddfile, evenfile)
         for even, line in oddeven(infile):
             outfiles[even].write(line)
kindall
  • 178,883
  • 35
  • 278
  • 309
  • I don't see any real benefit over using the `enumerate()` built-in straight, for example `for i, line in enumerate(infile): if i % 2 == 0: ...` – Ben Hoyt Aug 04 '13 at 22:04