4

I'm very new to python and I need to modify the

<test name="test02"></xmpp> to <test name="test03"></xmpp> 

<temp-config>QA</temp-config> to <temp-config>Prod</temp-config> 

for all 5 occurrences using python. Not sure what lib to use. Any help in this is really appreciated.

<config>
<logging></logging>
<test-mode>false</test-mode>
<test name="test02"></xmpp>
<mail></mail>
<test-system>0</test-system>
<system id="0" name="suite1" type="regression">
    <temp-config>QA</temp-config>
    <rpm>0.5</rpm>
    <cycles>3</cycles>
</system>
<system id="1" name="suite2" type="regression">
    <temp-config>QA</temp-config>
    <rpm>0.5</rpm>
    <cycles>3</cycles>
</system>
<system id="2" name="suite3" type="regression">
    <temp-config>QA</temp-config>
    <rpm>0.5</rpm>
    <cycles>3</cycles>
</system>
<system id="3" name="suite4" type="regression">
    <temp-config>QA</temp-config>
    <rpm>0.5</rpm>
    <cycles>3</cycles>
</system>
<system id="4" name="suite5" type="regression">
    <temp-config>QA</temp-config>
    <rpm>0.5</rpm>
    <cycles>3</cycles>
</system>
</config>
DigitalDyn
  • 165
  • 2
  • 3
  • 12

4 Answers4

2

ElementTree is a great choice -- pure Python and included in the standard library, so it's the most portable option. However, I always go straight to lxml -- it has the same API, it's just faster and it can do a lot more (since it is effectively a wrapper around libxml2).

from lxml import etree
tree = etree.parse(path_to_my_xml)

for elem in tree.findall('.//test'):
    assert elem.attrib['name'] == 'test02'
    elem.attrib['name'] == 'test03'

for elem in tree.findall('.//temp-config'):
    assert elem.text == 'QA'
    elem.text = 'Prod'

with open(path_to_output_file, 'w') as file_handle:
    file_handle.write(etree.tostring(tree, pretty_print=True, encoding='utf8'))
simon
  • 15,344
  • 5
  • 45
  • 67
  • Sorry I get the following error: Traceback (most recent call last): File "testrun.py", line 13, in with open(tree, 'w') as file_handle: TypeError: coercing to Unicode: need string or buffer, lxml.etree._ElementTree found – DigitalDyn Jan 29 '13 at 14:53
  • @DigitalDyn - `with open(tree, 'w') as file_handle:` is not part of what I posted. i think you may have made a copy-and-paste error :) – simon Jan 29 '13 at 16:53
  • @DigitalDyn `with open(file_out, 'wb') as file_handle:` – artamonovdev Oct 04 '19 at 19:42
2

Use lxml. This example uses lxml.etree and would actually fail on your example xml, as it has some unclosed tags in it. If you have the same problems with the real-world data that you are going to parse, use lxml.html, that can handle broken xml (instructions added to the code as comments).

In [14]: import lxml.etree as et  # for broken xml add an import:
                                  # import lxml.html as lh

In [15]: doc = et.fromstring(xmlstr)  # for broken xml replace this line with:
                                      # doc = lh.fromstring(xmlstr)

                                      # if you read xml from a file:
                                      # doc = et.parse('file_path')

In [16]: for elem in doc.xpath('.//temp-config'):
    ...:     elem.text = 'Prod'
    ...:     

In [17]: print et.tostring(doc,pretty_print=True)
<config>
  <logging/>
  <test-mode>false</test-mode>
  <test name="test02">
    <mail/>
    <test-system>0</test-system>
    <system id="0" name="suite1" type="regression">
      <temp-config>Prod</temp-config>
      <rpm>0.5</rpm>
      <cycles>3</cycles>
    </system>
    <system id="1" name="suite2" type="regression">
      <temp-config>Prod</temp-config>
      <rpm>0.5</rpm>
      <cycles>3</cycles>
    </system>
    <system id="2" name="suite3" type="regression">
      <temp-config>Prod</temp-config>
      <rpm>0.5</rpm>
      <cycles>3</cycles>
    </system>
    <system id="3" name="suite4" type="regression">
      <temp-config>Prod</temp-config>
      <rpm>0.5</rpm>
      <cycles>3</cycles>
    </system>
    <system id="4" name="suite5" type="regression">
      <temp-config>Prod</temp-config>
      <rpm>0.5</rpm>
      <cycles>3</cycles>
    </system>
  </test>
</config>

Note: As pointed out by others, you have some less powerful alternatives in the standard library. For simple tasks as this they may be well suited, however, if your xml files are broken, parsing them with standard library tools equals wasting your time.

root
  • 76,608
  • 25
  • 108
  • 120
0

i suggest to use ElementTree: http://docs.python.org/2/library/xml.etree.elementtree.html

example:

for atype in e.findall('type')
 print(atype.get('foobar'))

or see this thread: How do I parse XML in Python?

Community
  • 1
  • 1
doniyor
  • 36,596
  • 57
  • 175
  • 260
0

To complete the answers above, using lxml, here's how you'd change the 'name' attribute value:

from lxml import etree
tree = etree.parse(path_to_my_xml)
for elem in tree.xpath('.//temp-config'):
    elem.text = 'Prod'
for elem in tree.xpath(".//test[@name='test02']"):
    elem.attrib['name'] = 'test03'
isedev
  • 18,848
  • 3
  • 60
  • 59