Read all lines between two string

Question

I want to extact lines from an xml that are between from my xml. Here is an example:

<userData code="viPartListRailML" value="1">
            <partRailML s="0.0000000000000000e+00" id="0"/>
            <partRailML s="2.0000000000000000e+01" id="1"/>
            <partRailML s="9.4137883373059267e+01" id="2"/>
        </userData>

Here is my code, that I was trying:

import re

shakes = open("N:\SAJAT_MAPPAK\IGYULAVICS\/adhoc\pythonXMLread\probaxml\github_minta.xml", "r")
for x in shakes:
    if "userData" in x:
        print x
        continue
    if "/userData" in x:
        break

The problem is that it still gives back only the lines that contain <userData or </userData> How to modify it to get the lines between these two "words"

You only have `if` statements for strings that contain `'userData'` and `'userData'` so you'll need to add either another `if` or `else` statement or have some default code. — D Malan, Mar 11 '20 at 09:58
Check out https://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python and for info about using Python's XML parsing library. — D Malan, Mar 11 '20 at 10:00

Chang Ye · Answer 1 · 2020-03-11T10:32:21.877

Assuming that there is one <userData> block in your file, your can extract lines within block by:

shakes = open("./file.xml", "r")
inblock = False
for x in shakes:
    if "/userData" in x:
        inblock = False
    if inblock:
        print(x)
    if "userData" in x:
        inblock = True

But read your file with a xml parser is more robust, like:

import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')

for data in tree.getroot().iter('userData'):
    for child in data:
        print(ET.tostring(child))
        # or something else, eg:
        # print(child.tag)

BTW, use Python3 whenever possible, Python2 is retired.

inblock is True after "/userData" because it matches with "userData" as well. I didn't think that first either... — juha, Mar 13 '20 at 06:05

juha · Answer 2 · 2020-03-12T08:50:20.470

1

Easy way is to add a variable, which tells if you are between those words:

shakes = open("N:\SAJAT_MAPPAK\IGYULAVICS\/adhoc\pythonXMLread\probaxml\github_minta.xml", "r")
t=False
for x in shakes:
    if t:
        print(x) # also /userdata -line is printed
    if "/userData" in x:
        t=False
    elif "userData" in x: # this matches /userData as well--> elif
        t=True

edited Mar 12 '20 at 08:50

answered Mar 11 '20 at 10:08

juha

76
6

you could move the `if t` to the end. Or change the order, like @Chang Ye's answer – Pablo Mar 11 '20 at 10:11
Correct, however, I was thinking that /userData -line print is intended functionality (- and in case not, I added the comment into the code) – juha Mar 11 '20 at 10:50
Changed if "userData" order and another if to elif – juha Mar 12 '20 at 08:53

score 0 · Answer 3 · answered Mar 11 '20 at 10:16

You can use itertools.dropwhile to reach the <userData part and then use itertools.takewhile to read until </userData:

import itertools as it

result = it.takewhile(
    lambda x: '</userData' not in x,
    it.dropwhile(
        lambda x: '<userData' not in x,
        text.splitlines()
    )
)
print('\n'.join(result))

If you want to skip the <userData element you can add itertools.islice:

result = it.takewhile(
    lambda x: '</userData' not in x,
    it.islice(it.dropwhile(
        lambda x: '<userData' not in x,
        text.splitlines()
    ), 1, None)
)
print('\n'.join(result))

Read all lines between two string

3 Answers3