1

I have a xml file with some data.

<Emp>
<Name>Raja</Name>
<Location>
     <city>ABC</city>
     <geocode>123</geocode>
     <state>XYZ</state> 
</Location>
<sal>100</sal>
<type>temp</type> 
</Emp>

so the location information are wrong in the xml file, I have to replace it.

I have constructed the location information with corrected vales in python.

variable = '''
    <Location isupdated=1>
         <city>MyCity</city>
         <geocode>10.12</geocode>
         <state>MyState</state> 
    </Location>'''

So, the location tag should be replaced with the new information. Is there any simple way to update this in python.

I want the final result data like,

<Emp>
<Name>Raja</Name>
<Location isupdated=1>
         <city>MyCity</city>
         <geocode>10.12</geocode>
         <state>MyState</state>
</Location>
<sal>100</sal>
<type>temp</type> 
</Emp>

Any thoughts ??

Thanks.

Magendran V
  • 1,411
  • 3
  • 19
  • 33
  • I think the `regex` tag should be removed, it is no task for regex. Looks like a classic example for a node replacement with XML parser. – Wiktor Stribiżew Oct 13 '15 at 09:50
  • 1
    Possible duplicate of [Find and Replace Values in XML using Python](http://stackoverflow.com/questions/6523886/find-and-replace-values-in-xml-using-python) – Saket Mittal Oct 13 '15 at 09:54
  • @SaketMittal, But here I want to replace the Location tags itself? may be additionally one more tag added to variable template. I just want to remove the tag, and add the variable content in it. – Magendran V Oct 13 '15 at 09:58

1 Answers1

4

UPDATE - XML PARSER IMPLEMENTATION : since replace a specific <Location> tag require to modify the regex i'm providing a more general and safer alternative implementation based upon ElementTree parser (as stated above by @stribizhev and @Saket Mittal).

I've to add a root element <Emps> (to make a valid xml doc, requiring root element), i've also chosen to filter the location to edit by the <city> tag (but may be everyfield):

#!/usr/bin/python
# Alternative Implementation with ElementTree XML Parser

xml = '''\
<Emps>
    <Emp>
        <Name>Raja</Name>
        <Location>
            <city>ABC</city>
            <geocode>123</geocode>
            <state>XYZ</state>
        </Location>
        <sal>100</sal>
        <type>temp</type>
    </Emp>
    <Emp>
        <Name>GsusRecovery</Name>
        <Location>
            <city>Torino</city>
            <geocode>456</geocode>
            <state>UVW</state>
        </Location>
        <sal>120</sal>
        <type>perm</type>
    </Emp>
</Emps>
'''

from xml.etree import ElementTree as ET
# tree = ET.parse('input.xml')  # decomment to parse xml from file
tree = ET.ElementTree(ET.fromstring(xml))
root = tree.getroot()

for location in root.iter('Location'):
    if location.find('city').text == 'Torino':
        location.set("isupdated", "1")
        location.find('city').text = 'MyCity'
        location.find('geocode').text = '10.12'
        location.find('state').text = 'MyState'

print ET.tostring(root, encoding='utf8', method='xml')
# tree.write('output.xml') # decomment if you want to write to file

Online runnable version of the code here

PREVIOUS REGEX IMPLEMENTATION

This is a possible implementation using the lazy modifier .*? and dot all (?s):

#!/usr/bin/python

import re

xml = '''\
<Emp>
<Name>Raja</Name>
<Location>
     <city>ABC</city>
     <geocode>123</geocode>
     <state>XYZ</state>
</Location>
</Emp>'''

locUpdate = '''\
    <Location isupdated=1>
         <city>MyCity</city>
         <geocode>10.12</geocode>
         <state>MyState</state>
    </Location>'''

output = re.sub(r"(?s)<Location>.*?</Location>", r"%s" % locUpdate, xml)

print output

You can test the code online here

Caveat: if there are more than one <Location> tag in the xml input the regex replace them all with locUpdate. You have to use:

# (note the last ``1`` at the end to limit the substitution only to the first occurrence)
output = re.sub(r"(?s)<Location>.*?</Location>", r"%s" % locUpdate, xml, 1)
Giuseppe Ricupero
  • 6,134
  • 3
  • 23
  • 32
  • I have some tag after the tag, how to preserve that too in the output? – Magendran V Oct 13 '15 at 10:34
  • @MagendranV If you test the online [implementation](http://ideone.com/dB7oVK) the tag after is preserved. Anyway if you provide your xml input i'll check – Giuseppe Ricupero Oct 13 '15 at 10:40
  • I have slightly modified the input. please take a look – Magendran V Oct 13 '15 at 10:58
  • @MagendranV: nothing changes for me, everything after ``''`` is preserved (you can check the stdout of the modified code [online](http://ideone.com/dB7oVK). I believe there is something different in the way xml in provided to the regex or similar. – Giuseppe Ricupero Oct 13 '15 at 12:19
  • @MagendranV: i've provided an alternative implementation using ElementTree as XML parser with finer control over the substitution. This implementation (parsing from string or file provided) can solve the other problems you mentioned above as well. – Giuseppe Ricupero Oct 13 '15 at 13:35