0

I've got a data file where each "row" is delimited by \n\n\n. My solution is to isolate those rows by first slurping the file, and then splitting rows:

 for row in slurped_file.split('\n\n\n'):
    ...

Is there an "awk-like" approach I could take to parse the file as a stream within Python 2.7.9 , and split lines according to a given string value ? Thanks.

user2105469
  • 1,413
  • 3
  • 20
  • 37

1 Answers1

3

So there is no such thing in the standard library. But we can make a custom generator to iterate over such records:

def chunk_iterator(iterable):
    chunk = []
    empty_lines = 0
    for line in iterable:
        chunk.append(line)
        if line == '\n':
            empty_lines += 1
            if empty_lines == 2:
                yield ''.join(chunk[:-2])
                empty_lines, chunk = 0, []
        else:
            empty_lines = 0

    yield ''.join(chunk)

Use as:

with open('filename') as f:
    for chunk in chunk_iterator(f):
        ...

This will use the per-line iteration of file written in C in CPython and thus be faster than the general record separator solution.