0

I want to extact lines from an xml that are between from my xml. Here is an example:

<userData code="viPartListRailML" value="1">
            <partRailML s="0.0000000000000000e+00" id="0"/>
            <partRailML s="2.0000000000000000e+01" id="1"/>
            <partRailML s="9.4137883373059267e+01" id="2"/>
        </userData>

Here is my code, that I was trying:

import re

shakes = open("N:\SAJAT_MAPPAK\IGYULAVICS\/adhoc\pythonXMLread\probaxml\github_minta.xml", "r")
for x in shakes:
    if "userData" in x:
        print x
        continue
    if "/userData" in x:
        break

The problem is that it still gives back only the lines that contain <userData or </userData> How to modify it to get the lines between these two "words"

  • You only have `if` statements for strings that contain `'userData'` and `'userData'` so you'll need to add either another `if` or `else` statement or have some default code. – D Malan Mar 11 '20 at 09:58
  • Check out https://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python and for info about using Python's XML parsing library. – D Malan Mar 11 '20 at 10:00

3 Answers3

1

Assuming that there is one <userData> block in your file, your can extract lines within block by:

shakes = open("./file.xml", "r")
inblock = False
for x in shakes:
    if "/userData" in x:
        inblock = False
    if inblock:
        print(x)
    if "userData" in x:
        inblock = True

But read your file with a xml parser is more robust, like:

import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')

for data in tree.getroot().iter('userData'):
    for child in data:
        print(ET.tostring(child))
        # or something else, eg:
        # print(child.tag)

BTW, use Python3 whenever possible, Python2 is retired.

Chang Ye
  • 1,105
  • 1
  • 12
  • 25
  • inblock is True after "/userData" because it matches with "userData" as well. I didn't think that first either... – juha Mar 13 '20 at 06:05
1

Easy way is to add a variable, which tells if you are between those words:

shakes = open("N:\SAJAT_MAPPAK\IGYULAVICS\/adhoc\pythonXMLread\probaxml\github_minta.xml", "r")
t=False
for x in shakes:
    if t:
        print(x) # also /userdata -line is printed
    if "/userData" in x:
        t=False
    elif "userData" in x: # this matches /userData as well--> elif
        t=True
juha
  • 76
  • 6
  • you could move the `if t` to the end. Or change the order, like @Chang Ye's answer – Pablo Mar 11 '20 at 10:11
  • Correct, however, I was thinking that /userData -line print is intended functionality (- and in case not, I added the comment into the code) – juha Mar 11 '20 at 10:50
  • Changed if "userData" order and another if to elif – juha Mar 12 '20 at 08:53
0

You can use itertools.dropwhile to reach the <userData part and then use itertools.takewhile to read until </userData:

import itertools as it

result = it.takewhile(
    lambda x: '</userData' not in x,
    it.dropwhile(
        lambda x: '<userData' not in x,
        text.splitlines()
    )
)
print('\n'.join(result))

If you want to skip the <userData element you can add itertools.islice:

result = it.takewhile(
    lambda x: '</userData' not in x,
    it.islice(it.dropwhile(
        lambda x: '<userData' not in x,
        text.splitlines()
    ), 1, None)
)
print('\n'.join(result))
a_guest
  • 34,165
  • 12
  • 64
  • 118