2

I have some zipfiles ( 700+ ) with the following structure ( the file is exactly like this )

<?xml version="1.0" encoding="UTF-8"?>
<Values version="2.0">
<record name="trigger">
    <value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
    <value name="processingSuspended">false</value>
    <value name="retrievalSuspended">false</value>
</record>
<record name="trigger">
    <value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
    <value name="processingSuspended">false</value>
    <value name="retrievalSuspended">false</value>
</record>
</Values>

What i would like to achieve, is to replace, no matter if the value of the first occurrence fields processingSuspended and retrievalSuspended is true or false. to replace it to false. But only for the first occurrence.

EDIT:

By request im adding what i have so far, where i can get the fields that i want, But. i believe there is a simplier way to do that.:

import os
import zipfile
import glob
import time
import re

def main():
    rList = []
    for z in glob.glob("*.zip"):
        root = zipfile.ZipFile(z)
        for filename in root.namelist():
            if filename.find("node.ndf") >= 0:
                for line in root.read(filename).split("\n"):
                    if line.find("broker-trigger") >= 0:
                        for iline in root.read(filename).split("\n"):
                            Values = dict()
                            #match Processing state
                            if iline.find("processingSuspended") >= 0:
                                mpr = re.search(r'(.*>)(.*?)(<.*)', 
                                                iline, re.M|re.I)
                            #match Retrieval state
                            if iline.find("retrievalSuspended") >= 0:
                                mr = re.search(r'(.*>)(.*?)(<.*)', 
                                               iline, re.M|re.I)
                                Values['processingSuspended'] = mpr.group(2)
                                Values['retrievalSuspended'] = mr.group(2)
                                #print mr.group(2)
                                rList.append(Values)
    print rList

if __name__== "__main__":
    main()

Thanks in advance.

martineau
  • 119,623
  • 25
  • 170
  • 301
thclpr
  • 5,778
  • 10
  • 54
  • 87
  • 1
    We're happy to help you with specific problems you face, but we aren't going to do your work for you. What have you tried so far? – thegrinner Jul 10 '13 at 18:12
  • @thegrinner, Thanks for the reply, but im not asking for someone do my work, im just struggling about how i do this. For the moment i got a 80 lines script which identifies on a server which files have the propertie processingSuspended or retrievalSuspended. I could paste it here but it doenst seens relevant for the question. The main problem for me its that i really dont know how to do that even after some search on stackoverflow and python docs. – thclpr Jul 10 '13 at 18:18
  • 1
    If this is XML, why aren't you using an XML parser? – Waleed Khan Jul 10 '13 at 18:27
  • @WaleedKhan but with a xml parser im able to replace the first two fields that i want? could you provide an example if possible? – thclpr Jul 10 '13 at 18:28
  • Take a look at [this question](http://stackoverflow.com/questions/179287/what-is-the-best-way-to-change-text-contained-in-an-xml-file-using-python). You'd write the logic deciding what to replace, but that shouldn't be too terrible. The basic idea is read in the XML using an XML parser, edit it with DOM methods, the write it back out using the edited document. – thegrinner Jul 10 '13 at 18:34
  • I think you're making it too hard. Why not just see if the `` is one of the ones you're interested in replace it with the static string `false`. Don't bother parsing it since you don't care about what's in it anyway... – twalberg Jul 10 '13 at 18:35
  • let me check that module @thegrinner, i will reply as soon i get a success :) – thclpr Jul 10 '13 at 18:45
  • After you replace the `processingSuspended` and `retrievalSuspended` fields' text, what do you want to do with the updated file contents, replace the existing zip file or create a new one? Will each zip file archive contain only one member with the xml data in it? – martineau Jul 11 '13 at 08:33
  • @martineau in resume im gonna replace and create a new zipfile, but that part i allready have figured out. for now im still figuring out how to replace the first occurrence. – thclpr Jul 11 '13 at 09:08

2 Answers2

1

Try using lxml:

>>> xml = '''\
<?xml version="1.0" encoding="UTF-8"?>
<Values version="2.0">
<record name="trigger">
    <value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
    <value name="processingSuspended">true</value>
    <value name="retrievalSuspended">true</value>
</record>
<record name="trigger">
    <value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
    <value name="processingSuspended">true</value>
    <value name="retrievalSuspended">true</value>
</record>
</Values>\
'''

>>> from lxml import etree
>>> tree = etree.fromstring(xml)
>>> tree.xpath('//value[@name="processingSuspended"]')[0].text = 'false'
>>> tree.xpath('//value[@name="retrievalSuspended"]')[0].text = 'false'

This xpath expression '//value[@name="processingSuspended"]' finds all the tags value with attribute name equal to "processingSuspended". Then we just take the first one with [0] and change the tag's text to 'false'.

Output:

>>> print(etree.tostring(tree, pretty_print=True))
<Values version="2.0">
<record name="trigger">
    <value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
    <value name="processingSuspended">false</value>
    <value name="retrievalSuspended">false</value>
</record>
<record name="trigger">
    <value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
    <value name="processingSuspended">true</value>
    <value name="retrievalSuspended">true</value>
</record>
</Values>

>>> 
ovgolovin
  • 13,063
  • 6
  • 47
  • 78
0

You can read the zip archives and update the xml formatted data in the file they contain with Python's built-in modules. There's even a tutorial in the documentation for xml.etree.ElementTree.

import glob
import xml.etree.ElementTree as ET
import zipfile

def main():
    for z in glob.glob("*.zip"):
        print 'processing file: {!r}'.format(z)
        zfile = zipfile.ZipFile(z)
        for filename in zfile.namelist():
            print 'processing archive member: {!r} in {}'.format(filename, z)
            contents = zfile.open(filename).read()

            print 'Before changes:'
            print contents

            root = ET.fromstring(contents)
            if root.tag != "Values" or root.attrib["version"] != "2.0":
                print 'unsupported xml file'
                break

            if(root[0][1].tag == "value" and
               root[0][1].attrib["name"] == "processingSuspended"):
                root[0][1].text = "false"
            else:
                print 'expected "processingSuspended" value field not found'
                break

            if(root[0][2].tag == "value" and
               root[0][2].attrib["name"] == "retrievalSuspended"):
                root[0][2].text = "false"
            else:
                print 'expected "retrievalSuspended" value field not found'
                break

            print 'After changes:'
            updated_contents = ET.tostring(root)
            print updated_contents

if __name__== "__main__":
    main()
martineau
  • 119,623
  • 25
  • 170
  • 301