1

I'm writing a script that checks a short xml file with the following data:

Example:

<manager>
    <name="adi" lastName="kant">
        <info ip="10.12.180.107" platform="linux" user="root" password="dangerous"/>
    </name>
    <name="dino" lastName="kant">
        <info ip="10.12.180.108" platform="linux" user="root" password="dangerous"/>
    </name>
</manager>

I'm trying to create a python script that will look into this xml file and remove the selected name information.

Example:

removeData(xmlFile)
print xmlFile
<manager>
    <name="dino" lastName="kant">
        <info ip="10.12.180.108" platform="linux" user="root" password="dangerous"/>
    </name>
</manager>

The only solution I could come with was to read the file up to the name I wish to remove and then append that into one list, then read the file from the name after the name I wish to remove and append that to another list, combine those two lists and print that into my file.

Example:

h = open("/home/service/chimera/array_cert/test.txt", "r")
output = []
lines = h.readlines()
for line in lines:
    if '<name="adi"' in line: 
        break
    output.append(line)
i = 0
for line in lines:
    i+=1
    if '<name="dino"' in line:
        break
for line in lines[i:]:
    output.append(line)
h.close()
h = open("/home/service/chimera/array_cert/test.txt", "w")
h.truncate()
for line in output:
    h.write(line)

But this seems needlessly complex. Is there a simpler way to do this?

Also I'm using python 2.6 on a Linux system.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Adilicious
  • 1,643
  • 4
  • 18
  • 22
  • Do you need to retain the same formatting as in source file? If not, probably you could just parse XML and output a new XML file. – Timothy Ha Nov 18 '14 at 14:18

2 Answers2

1

Use a SAX parser such as xml.sax. This gives you callbacks as it scans the XML file for each of the various xml tags or 'events' (ie opening a tag, closing a tag, seeing an attribute, seeing some data, etc). Keep track of whether you are in part of the XML file you do or do not want to keep (or delete) as you get these callbacks. Stream the data into a new file if you are in "keeping" mode, and don't otherwise.

When dealing with XML, always use a proper parser of some sort. The dangers of trying to use regexes or otherwise trying to do it yourself have been well documented.

Community
  • 1
  • 1
Vic Smith
  • 3,477
  • 1
  • 18
  • 29
1

Do you need to retain the same formatting as in source file? If not, probably you could just parse XML and output a new XML file.

If you can trust the XML in your source to have <name...> and </name> on different lines, you can modify your code just a little:

h = open("test1.txt", "r")
output = []
lines = h.readlines()
foroutput = 1
for line in lines:
    if '<name="adi"' in line:
        foroutput = 0
    if foroutput==1:
        output.append(line)
    elif '</name>' in line:
        foroutput = 1
h.close()
h = open("test2.txt", "w")
h.truncate()
for line in output:
    h.write(line)
h.close()
Timothy Ha
  • 399
  • 3
  • 7
  • 1
    Sorry for the late reply. This does indeed work, it does what I want it to but I ended up just using an XML parser to do what I needed it to do. Thank you for your help! – Adilicious Nov 19 '14 at 12:36
  • Yes, sure it's the safest way to use an XML parser :-) – Timothy Ha Nov 19 '14 at 12:40