2

I have a file which contains blocks of lines that I would like to separate. Each block contains a number identifier in the block's header: "Block X" is the header line for the X-th block of lines. Like this:

Block X
#L E  C  A  F  X  M  N 
11.2145 15 27 29.444444 7.6025229 1539742 29.419783
11.21451 13 28 24.607143 6.8247935 1596787 24.586264
...
Block Y
#L E  C  A  F  X  M  N 
11.2145 15 27 29.444444 7.6025229 1539742 29.419783
11.21451 13 28 24.607143 6.8247935 1596787 24.586264
...

I can use "enumerate" to find the header line of the block as follows:

with open(filename,'r') as indata:
        for num, line in enumerate(indata):
            if 'Block X' in line:
                startblock=num
                print startblock

This will yield the line number of the first line of block #X.
However, my problem is identifying the last line of the block. To do that, I could find the next occurrence of a header line (i.e., the next block) and subtract a few numbers.

My question: how can I find the line number of a the next occurrence of a condition (i.e., right after a certain condition was met)?

I tried using enumerate again, this time indicating the starting value, like this:

with open(filename,'r') as indata:
        for num, line in enumerate(indata,startblock):
            if 'Block Y ' in line:
                endscan=num
                break            
    print endscan 

That doesn't work, because it still begins reading the file from line 0, NOT from the line number "startblock". Instead, by starting the "enumerate" counter from a different number, the resulting value of the counter, in this case "endscan" is shifted from 0 by the amount "startblock".

Please, help! How can tell python to disregard the lines previous to "startblock"?

alexfp
  • 31
  • 3
  • 1
    just keep all the lines _until_ you find a block header in a list. when you find a header, dig up what you need from the stored lines and clear the list – pvg Dec 23 '15 at 00:05

3 Answers3

2

If you want the groups using Block as the delimiter for each section, you can use itertools.groupby:

from itertools import groupby

with open('test.txt') as f:
    grp = groupby(f,key=lambda x: x.startswith("Block "))
    for k,v in grp:
        if k:
           print(list(v) + list(next(grp, ("", ""))[1]))

Output:

['Block X\n', '#L E  C  A  F  X  M  N \n', '11.2145 15 27 29.444444 7.6025229 1539742 29.419783\n', '11.21451 13 28 24.607143 6.8247935 1596787 24.586264\n']
['Block Y\n', '#L E  C  A  F  X  M  N \n', '11.2145 15 27 29.444444 7.6025229 1539742 29.419783\n', '11.21451 13 28 24.607143 6.8247935 1596787 24.586264']

If Block can appear elsewhere but you want it only when followed by a space and a single char:

import re

with open('test.txt') as f:
    r = re.compile("^Block \w$")
    grp = groupby(f, key=lambda x: r.search(x))
    for k, v in grp:
        if k:
            print(list(v) + list(next(grp, ("", ""))[1]))
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
0

You can use the .tell() and .seek() methods of file objects to move around. So for example:

with open(filename, 'r') as infile:
    start = infile.tell()
    end = 0
    for line in infile:
        if line.startswith('Block'):
            end = infile.tell()
            infile.seek(start)
            # print all the bytes in the block
            print infile.read(end - start)
            # now go back to where we were so we iterate correctly
            infile.seek(end)
            # we finished a block, mark the start
            start = end
0

If the difference between the header lines is uniform throughout the file, just use the distance to increase the indexing variable accordingly.

    file1 = open('file_name','r')
    lines = file1.readlines()
    numlines = len(lines)
    i=0
    for line in file:
        if line == 'specific header 1':
           line_num1 = i
        if line == 'specific header 2':
           line_num2 = i
    i+=1 
   diff = line_num2 - line_num1

Now that we know the difference between the line numbers we use for loops to acquire the data.

    k=0
    array = np.zeros([numlines, diff])
    for i in range(numlines):
        if k % diff == 0:            
           for j in range(diff):
               array[i][j] = lines[i+j]
        k+=1

% is the mod operator which returns 0 only when k is a multiple of the difference in line numbers between the two header lines in the file, which will only occur when the line corresponds to the a header line. Once the line is fixed we go on to the second for loop that fills the array so that we have a matrix that is numlines number of rows and a diff number of columns. The nonzeros rows will contain the data inbetween the header lines.

I have not tried this out, I am just writing off the top of my head. Hopefully it helps!

Cauchy
  • 71
  • 6