0

I'm a newbie with Python and I'd like to remove the element openingHours and the child elements from the XML.

I have this input

<Root>
   <stations>
      <station id= "1">
          <name>whatever</name>
          <openingHours>
               <openingHour>
                    <entrance>main</entrance>
                       <timeInterval>
                         <from>05:30</from>
                         <to>21:30</to>
                       </timeInterval>
                <openingHour/>
          <openingHours>
      <station/>
      <station id= "2">
          <name>foo</name>
          <openingHours>
               <openingHour>
                    <entrance>main</entrance>
                       <timeInterval>
                         <from>06:30</from>
                         <to>21:30</to>
                       </timeInterval>
                <openingHour/>
          <openingHours>
       <station/>
   <stations/>
  <Root/>

I'd like this output

  <Root>
   <stations>
      <station id= "1">
          <name>whatever</name>
      <station/>
      <station id= "2">
          <name>foo</name>
      <station/>
   <stations/>
  <Root/>

So far I've tried this from another thread How to remove elements from XML using Python

from lxml import etree

doc=etree.parse('stations.xml')
for elem in doc.xpath('//*[attribute::openingHour]'):
   parent = elem.getparent()
   parent.remove(elem)
print(etree.tostring(doc))

However, It doesn't seem to be working. Thanks

AldousLem
  • 13
  • 1
  • 3

2 Answers2

1

I took your code for a spin but at first Python couldn't agree with the way you composed your XML, wanting the / in the closing tag to be at the beginning (like </...>) instead of at the end (<.../>).

That aside, the reason your code isn't working is because the xpath expression is looking for the attribute openingHour while in reality you want to look for elements called openingHours. I got it to work by changing the expression to //openingHours. Making the entire code:

from lxml import etree

doc=etree.parse('stations.xml')
for elem in doc.xpath('//openingHours'):
    parent = elem.getparent()
    parent.remove(elem)
print(etree.tostring(doc))
yarwest
  • 873
  • 8
  • 19
0

You want to remove the tags <openingHours> and not some attribute with name openingHour:

from lxml import etree

doc = etree.parse('stations.xml')
for elem in doc.findall('.//openingHours'):
    parent = elem.getparent()
    parent.remove(elem)
print(etree.tostring(doc))
Daniel
  • 42,087
  • 4
  • 55
  • 81