0

I am trying to follow a answer given here:

How to only read lines in a text file after a certain string using python?

in reading only the line after a certain phrase in which I went the boolean route, or the second answer.

I need to get just the numbers between a two opening and closing section from a file

<type>
1 
2
3
<type>

However when I used this code:

found_type = False
t_ype = [] 
with open('test.xml', 'r') as f:
    for line in f:
        if '<type>' in line:
            found_type = True
        if found_type:
            if '</type>' in line:
               found_type = False               
            else:    
                t_line = str(line).rstrip('\n')
                t_ype.append(t_line)

I can't get skip the first line and get :

'<type>', '1','2','3'

Where I just want

'1','2','3'

while ending the appending to the list when I hit as I don't need that in my list

I'm not sure what I'm doing wrong and can't ask on the page because my rep isn't high enough.

Community
  • 1
  • 1
l33tHax0r
  • 1,384
  • 1
  • 15
  • 31
  • Why not use xml with python? – Joel Feb 15 '16 at 19:13
  • This may look like xml script I am handling but it is for a molecular dynamics simulations script that has over 50000 lines that are separated by these headers. I need a quick way to grab certain sections and then append them to new files – l33tHax0r Feb 15 '16 at 19:27
  • http://stackoverflow.com/questions/34571288/print-first-paragraph-in-python/34571405#34571405 http://stackoverflow.com/questions/31507045/read-multiple-block-of-file-between-start-and-stop-flags/31507083#31507083 – Padraic Cunningham Feb 15 '16 at 19:49
  • 1
    @PadraicCunningham I saw a similar one using that module. I will take a look at that one in more depth later. Thanks for your response – l33tHax0r Feb 15 '16 at 20:00

2 Answers2

1

You have to skip the rest of the for loop after detecting the "header". In your code, you're setting found_type to True and then the if found_type: check matches.

found_type = False
t_ype = [] 
with open('test.xml', 'r') as f:
    for line in f:
        if '<type>' in line:
            found_type = True
            continue                    # This is the only change to your code.
                                        # When the header is found, immediately go to the next line
        if found_type:
            if '</type>' in line:
               found_type = False               
            else:    
                t_line = str(line).rstrip('\n')
                t_ype.append(t_line)
Jasper
  • 3,939
  • 1
  • 18
  • 35
0

The simplest approach is a double loop with yield:

def section(fle, begin, end):
    with open(fle) as f:
        for line in f:
            # found start of section so start iterating from next line
            if line.startswith(begin):
                for line in f: 
                    # found end so end function
                    if line.startswith(end):
                        return
                    # yield every line in the section
                    yield line.rstrip()     

Then just either call list(section('test.xml','<type>','</type>')) or iterate over for line in section('test.xml','<type>','</type>'):use lines,if you have repeating sections then swap the return for a break. You also don't need to call str on the lines as they are already strings, if you have a large file then the groupby approach in the comments might be a better alternative.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321