Read multiple block of file between start and stop flags

Question

I am trying to read sections of a file into numpy arrays that have similar start and stop flags for the different sections of the file. At the moment I have found a method that works, but for only one section of the input file before needing to re open the input file.

My code at the moment is:

    with open("myFile.txt") as f:
        array = []
        parsing = False
        for line in f:
            if line.startswith('stop flag'):
            parsing = False
        if parsing:
            #do things to the data
        if line.startswith('start flag'):
            parsing = True

I found the code from this question

With this code I need to re-open and read through the file.

Is there a way to read all sections without having to open the file for each section to be read?

how big is your file/how comfortable are you with generators? — NightShadeQueen, Jul 19 '15 at 23:35

Padraic Cunningham · Accepted Answer · 2015-07-19T23:45:12.680

You can use itertools.takewhile each time you reach the start flag to take until the stop:

from itertools import takewhile
with open("myFile.txt") as f:
        array = []
        for line in f:
            if line.startswith('start flag'):               
                data = takewhile(lambda x: not x.startswith("stop flag"),f)
                # use data and repeat

Or just use an inner loop:

with open("myFile.txt") as f:
    array = []
    for line in f:
        if line.startswith('start flag'):
            # beginning of section use first lin
            for line in f:
                # check for end of section breaking if we find the stop lone
                if line.startswith("stop flag"):
                    break
                 # else process lines from section

A file object returns its own iterator so the pointer will keep moving as you iterate over f, when you reach the start flag, start processing a section until you hit the stop. There is no reason to re-open the file at all, just use the sections as you iterate once over the lines of the file. If the start and stop flag lines are considered part of the section make sure to also use those too.

score 1 · Answer 2 · answered Jul 19 '15 at 23:39

You have indentation problem, your code should look like this:

with open("myFile.txt") as f:
    array = []
    parsing = False
    for line in f:
        if line.startswith('stop flag'):
        parsing = False
        if parsing:
        #do things to the data
        if line.startswith('start flag'):
        parsing = True

score 0 · Answer 3 · answered Jul 19 '15 at 23:47

The solution similar to yours would be:

result = []
parse = False
with open("myFile.txt") as f:
    for line in f:
        if line.startswith('stop flag'):
            parse = False
        elif line.startswith('start flag'):
            parse = True
        elif parse:
            result.append(line)
        else:  # not needed, but I like to always add else clause
            continue
print result

But you might also use inner loop or itertools.takewhile as other answers suggest. Especially using takewhile should be significantly faster for really big files.

score -1 · Answer 4 · answered Jul 19 '15 at 23:43

Let's say this is your file to read:

**starting** blabla blabla **starting** bleble bleble **starting** bumbum bumbum

This is code of the program:

file = open("testfile.txt", "r")
data = file.read()
file.close
data = data.split("**starting**")
print(data)

And this is output:

['', '\nblabla\nblabla\n', '\nbleble\nbleble\n', '\nbumbum\nbumbum']

Later you can del empty element, or do other operation in your data. split function is buildin for string objects and can get more complicated strings as arguments.

Read multiple block of file between start and stop flags

4 Answers4

Linked

Related