0

I've been playing with XML data in a text file. Just some general stuff.

I played around with xml.etree and its commands, but now I am wondering how to get rid of the tags manually and write all that data into a new file.

I figure it would take a lot of str.splits or a loop to get rid of the tags.

I right now have this to start (not working, just copies the data):

def summarizeData(fileName):
    readFile = open(fileName, "r").read()
    newFile = input("")
    writeFile = open(newFile, "w")
    with open(fileName, "r") as file:
        for tags in file:
            Xtags = tags.split('>')[1].split('<')[0]
    writeFile.write(readFile)
    writeFile.close

So far it just copies the data, including the tags. I figured splitting the tags would do the trick, but it seems like it doesn't do anything. Would it be possible to do manually, or do I have to use xml.etree?

Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
thatoneguy
  • 73
  • 6

1 Answers1

2

The reason you don't see any changes is that you're just writing the data you read from fileName into readFile in this line:

    readFile = open(fileName, "r").read()

... straight back to writeFile in this line:

    writeFile.write(readFile)

Nothing you do inbetween (with Xtags etc.) has any effect on readFile, so it's getting written back as-is.

Apart from that issue, which you could fix with a little work ... parsing XML is nowhere near as straightforward as you think it is. You have to think about tags which span multiple lines, angle brackets that can appear inside attribute values, comments and CDATA sections, and a host of other subtle issues.

In summary: use a real XML parser like xml.etree.

Community
  • 1
  • 1
Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160