Search and replace multiple lines in xml/text files using python

Question

---Update 3: I have got the script to update the required data into the xml files completed but the following code is being dropped from the written file. Why is this? how can I replace it?

<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type='text/xsl' href='ANZMeta.xsl'?>

Current working code (except for issue mentioned above).

import os, xml, arcpy, shutil
from xml.etree import ElementTree as et 

path=os.getcwd()
arcpy.env.workspace = path

FileList = arcpy.ListFeatureClasses()
FileCount = len(FileList)
zone="_Zone"

for File in FileList:
    FileDesc_obj = arcpy.Describe(File)
    FileNm=FileDesc_obj.file
    newMetaFile=FileNm+"_BaseMetadata.xml"

    check_meta=os.listdir(path)
    if FileNm+'.xml' in check_meta:
        shutil.copy2(FileNm+'.xml', newMetaFile)
    else:
        shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
    tree=et.parse(newMetaFile)

    print "Processing: "+str(File)

    for node in tree.findall('.//title'):
        node.text = str(FileNm)
    for node in tree.findall('.//northbc'):
        node.text = str(FileDesc_obj.extent.YMax)
    for node in tree.findall('.//southbc'):
        node.text = str(FileDesc_obj.extent.YMin)
    for node in tree.findall('.//westbc'):
        node.text = str(FileDesc_obj.extent.XMin)
    for node in tree.findall('.//eastbc'):
        node.text = str(FileDesc_obj.extent.XMax)        
    for node in tree.findall('.//native/nondig/formname'):
        node.text = str(os.getcwd()+"\\"+File)
    for node in tree.findall('.//native/digform/formname'):
        node.text = str(FileDesc_obj.featureType)
    for node in tree.findall('.//avlform/nondig/formname'):
        node.text = str(FileDesc_obj.extension)
    for node in tree.findall('.//avlform/digform/formname'):
        node.text = str(float(os.path.getsize(File))/int(1024))+" KB"
    for node in tree.findall('.//theme'):
        node.text = str(FileDesc_obj.spatialReference.name +" ; EPSG: "+str(FileDesc_obj.spatialReference.factoryCode))
    print node.text
    projection_info=[]
    Zone=FileDesc_obj.spatialReference.name

    if "GCS" in str(FileDesc_obj.spatialReference.name):
        projection_info=[FileDesc_obj.spatialReference.GCSName, FileDesc_obj.spatialReference.angularUnitName, FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName]
        print "Geographic Coordinate system"
    else:
        projection_info=[FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName, FileDesc_obj.spatialReference.angularUnitName, Zone[Zone.rfind(zone)-3:]]
        print "Projected Coordinate system"
    x=0
    for node in tree.findall('.//spdom'):
        for node2 in node.findall('.//keyword'):
            print node2.text
            node2.text = str(projection_info[x])
            print node2.text
            x=x+1


    tree.write(newMetaFile)

---Update 1&2: Thanks to Aleyna I have the following basic code that works

import os, xml, arcpy, shutil
from xml.etree import ElementTree as et 

CodeString=['northbc','southbc', '<nondig><formname>']

nondig='nondigital'
path=os.getcwd()
arcpy.env.workspace = path
xmlfile = path+"\\test.xml"

FileList = arcpy.ListFeatureClasses()
FileCount = len(FileList)

for File in FileList:
    FileDesc_obj = arcpy.Describe(File)
    FileNm=FileDesc_obj.file
    newMetaFile=FileNm+"_Metadata.xml"
    shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
    tree=et.parse(newMetaFile)

    for node in tree.findall('.//northbc'):
        node.text = str(FileDesc_obj.extent.YMax)
    for node in tree.findall('.//southbc'):
        node.text = str(FileDesc_obj.extent.YMin)
    for node in tree.findall('.//westbc'):
        node.text = str(FileDesc_obj.extent.XMin)
    for node in tree.findall('.//eastbc'):
        node.text = str(FileDesc_obj.extent.XMax)        
    for node in tree.findall('.//native/nondig/formname'):
        node.text = nondig

    tree.write(newMetaFile)

The issue is with dealing with xml code like

- <spdom>
  <keyword thesaurus="">GDA94</keyword> 
  <keyword thesaurus="">GRS80</keyword> 
  <keyword thesaurus="">Transverse Mercator</keyword> 
  <keyword thesaurus="">Zone 55 (144E - 150E)</keyword> 
  </spdom>

As keyword thes...is not unique within the <spdom> can we update these in a order from the values coming from

FileDesc_obj.spatialReference.name

u'GCS_GDA_1994'

---ORIGINAL POST---

I am building up a program to generate xml metadata files from spatial files in our library. I have already created the scripts to extract the required spatial and attrib data from the files and create a shp and text file based index of the files but now I want to write this info to base metadata xml file that is written to anzlic standards by replacing the values held by common/static elements...

So for example I want to replace the following xml code

<northbc>8097970</northbc>
<southbc>8078568</southbc>

with

<northbc> GeneratedValue_[desc.extent.XMax] /<northbc>
<southbc> GeneratedValue_[desc.extent.XMax] </southbc>

The issue is that obviously the number/value between and will not be the same.

Similarly for xml tags like <title>, <nondig><formname> etc...in the latter example both tags must be searched for together as formname appears multiple times (is not unique).

I am using the Python Regular Expression manual [here][1],

thanks...I am not trying to write an xml file from scratch. I just want to replace chunks of text within given attributes based on input from the arcpy module. — GeorgeC, Jan 30 '12 at 03:21
So when it produces output that looks like `8097970`, your regex will handle it? — Borealid, Jan 30 '12 at 03:22
why would it? it is just getting desc.extent.XMax where desc=arcpy.Describe(shp_file) for example. — GeorgeC, Jan 30 '12 at 03:25
Look, is it really so hard to use a library designed for what you're trying to do instead of one designed for parsing unstructured text? I'm really trying to save you a headache, here. — Borealid, Jan 30 '12 at 03:26
understood and thanks but I just don't know which library to use and how to get it going. I am trying to use the process in http://stackoverflow.com/questions/5993286/python-search-replace-content-of-xml — GeorgeC, Jan 30 '12 at 03:29
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/7155/discussion-between-georgec-and-borealid) — GeorgeC, Jan 30 '12 at 04:02

score 2 · Accepted Answer · answered Jan 30 '12 at 04:29

2

Using the given tag(s) above:

import os
import xml
from xml.etree import ElementTree as et 
path = r"/your/path/to/xml.file" 
tree = et.parse(path)
for node in tree.findall('.//northbc'):
    node.text = "New Value"
tree.write(path)

Here, XPATH .//northbc returns all the 'northbc' nodes in the XML doc. You can tailor the code for your need easily.

answered Jan 30 '12 at 04:29

Aleyna

1,857
4
20
27

Thanks but I get the following... >> path=os.getcwd() >> tree=et.parse(path) Traceback (most recent call last): File "C:\Program Files (x86)\Wing IDE 101 4.0\src\debug\tserver\_sandbox.py", line 1, in # Used internally for debug sandbox under external interpreter File "C:\Python26\ArcGIS10.0\Lib\xml\etree\ElementTree.py", line 862, in parse tree.parse(source, parser) File "C:\Python26\ArcGIS10.0\Lib\xml\etree\ElementTree.py", line 579, in parse source = open(source, "rb") IOError: [Errno 13] Permission denied: 'L:\\Data_Admin\\QA\\Metadata_python_toolset\\training' – GeorgeC Jan 30 '12 at 05:19
Please DISREGARD my previous comment. It works fine when path is an actual xml file. What would you do with repeating tags like the 3rd example - '' where formname is repeated but nondig is unique. – GeorgeC Jan 30 '12 at 05:26
If I am getting it right, you have multiple s that are direct children of unique nodes? Then you can use such an xpath .//nondig/formname to get s. You can either walt up in the tree and check the parent before replacing the value or even better you can rewrite your xpath using parent's unique attr(perhaps an id?) so that s will be grouped by s. – Aleyna Jan 30 '12 at 05:54
Not sure if .//spdom/keyword will return you s in the order they appear in the doc. However, you can just select all s and walk thru child s in a loop replacing the values in order they come from doc. (And of course, the order in doc must match the order in your new data source) – Aleyna Jan 30 '12 at 17:13

score 1 · Answer 2 · answered Jan 30 '12 at 03:27

If you're dealing with valid XML, use XPath to find the nodes of interest and the ElementTree api to manipulate the node.

For instance, your xpath might be something like '//northbc' and you would just replace the text node inside it.

See http://docs.python.org/library/xml.etree.elementtree.html as well as http://pypi.python.org/pypi/lxml/2.2.8 for two different libraries that will help you get this done. Search google for XPath and see the w3c tutorial for a decent intro to XPath (I apparently can't post more than two links in a post or I'd link it too)

thanks. This seems to on the right track and am just going through http://www.w3schools.com/xpath/ — GeorgeC, Jan 30 '12 at 03:42

score 0 · Answer 3 · answered Jan 30 '12 at 03:21

0

I might be stating the obvious here, but did you consider using a DOM tree to parse and manipulate your XML?

answered Jan 30 '12 at 03:21

inspectorG4dget

110,290
27
149
241

Search and replace multiple lines in xml/text files using python

3 Answers3

Linked