1

I am beginner in python. I am struggling with a problem which is explained below. I am sharing incomplete python script also which does not work for this problem. I would be grateful if get support or instruction for my script.

File looks like this:

<Iteration>
  <Iteration_hit>Elememt1 Element1
    abc1 hit 1
  .
  .
</Iteration>
<Iteration>
  <Iteration_hit>Elememt2 Element2
    abc2 hit 1
  .
  .
</Iteration>
<Iteration>
  <Iteration_hit>Elememt3 Element3
    abc3 hit 1
  .
  .
</Iteration>
<Iteration>
  <Iteration_hit>Elememt4 Element4
    abc4 hit 1
  .
  .
</Iteration>

I need from <Iteration> to </Iteration> for Elements list match, which means for Element2 and Element4 the output file should look like this:

<Iteration>
  <Iteration_hit>Elememt2 Element2
    abc2 hit 1
  .
  .
</Iteration>
<Iteration>
  <Iteration_hit>Elememt4 Element4
    abc4 hit 1
  .
  .
</Iteration>

Script

#!/usr/bin/python
x = raw_input("Enter your xml file name: ")
xml = open(x)
l = raw_input("Enter your list file name: ")
lst = open(l)
Id = list()
ylist = list()
import re
for line in lst:
        stuff=line.rstrip()
        stuff.split()
        Id.append(stuff)
for ele in Id:
        for line1 in xml:
                if line1.startswith("  <Iteration_hit>"):
                        y = line1.split()
#                       print y[1]
                        if y[1] == ele: break
Mr Lister
  • 45,515
  • 15
  • 108
  • 150
kashiff007
  • 376
  • 2
  • 12

2 Answers2

0

It isn't recommended to use regex to parse XML - you should use a library such as lxml, which you can install using pip install lxml. Then, you could select the appropriate elements to output using lxml and XPath as follows (I have taken the liberty of closing the <Iteration_hit> tags in your XML):

content = '''
<root>
<Iteration>
  <Iteration_hit>Elememt1 Element1
    abc1 hit 1
  </Iteration_hit>
</Iteration>
<Iteration>
  <Iteration_hit>Elememt2 Element2
    abc2 hit 1
  </Iteration_hit>
</Iteration>
<Iteration>
  <Iteration_hit>Elememt3 Element3
    abc3 hit 1
  </Iteration_hit>
</Iteration>
<Iteration>
  <Iteration_hit>Elememt4 Element4
    abc4 hit 1
  </Iteration_hit>
</Iteration>
</root>
'''

from lxml import etree

tree = etree.XML(content)
target_elements = tree.xpath('//Iteration_hit[contains(., "Element2") or contains(., "Element4")]')

for element in target_elements:
    print(etree.tostring(element))

Output

<Iteration_hit>Elememt2 Element2
    abc2 hit 1
  </Iteration_hit>

<Iteration_hit>Elememt4 Element4
    abc4 hit 1
  </Iteration_hit>
gtlambert
  • 11,711
  • 2
  • 30
  • 48
  • Happy to help, and welcome to Stack Overflow. If this answer or any other one solved your issue, please mark it as accepted. – gtlambert Jan 14 '16 at 18:21
0

Here is the desired complete script for xml parsing through Python

#!/usr/bin/python
from lxml import etree

with open('input.xml', 'r') as myfile:
    content=myfile.read().replace('\n', '\n')


lst = open('ID.list')
Id = list()
for line in lst:
    stuff=line.rstrip()
    stuff.split()
    Id.append(stuff)
for ele in Id:
    tree = etree.XML(content)
    target_elements = tree.xpath('//Iteration[contains(., ele)]')

for element in target_elements:
    print(etree.tostring(element))
kashiff007
  • 376
  • 2
  • 12