0

I'm writing a Python script in which I need to do some search and replace in some XML-like strings.

Let's say I have a string foo that contains some XML of the form <bar>something</bar>. I want to change the something within the <bar>...</bar>, let's say to <bar>something_else</bar>.

How would I go about doing this?

Sorry if this is a newbie question, but I'm pretty new to Python. I read through the Python regular expression docs but that only served to confuse me further. (actually I find regular expressions confusing in other contexts/languages as well, so that may not necessarily be Python's fault :) )

Donald Burr
  • 2,281
  • 2
  • 23
  • 31
  • 4
    XML or XML-like? because regex and HTML/XML are like oil and water - they don't like to mix. If it's HTML or XML, you're far better off using something like `lxml.etree` to make changes to the contexts. – isedev Mar 22 '13 at 21:11
  • I agree with isedev http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html – That1Guy Mar 22 '13 at 21:12
  • 1
    This might be a good starting place: http://docs.python.org/release/2.5.2/lib/module-xml.dom.html Once you have done your XML parsing you can do regex on the fetching strings (inside the tags) to go even deeper if needed. –  Mar 22 '13 at 21:14
  • If you want to replace the text inside XML tags, I'd recommend going with an XML parsing approach such as [`lxml`](http://lxml.de/) – inspectorG4dget Mar 22 '13 at 21:15

1 Answers1

1

Here's an trivial example of how you would do it with lxml.etree, to get you started along the right path (based on the comments above):

text = """<tag><innertag>text</innertag>more text</tag>"""
import lxml.etree
xml = lxml.etree.fromstring(text)
node = xml.find('innertag')
node.text = "new text"
lxml.etree.tostring(xml)
'<tag><innertag>new text</innertag>more text</tag>'

See the lxml documentation.

isedev
  • 18,848
  • 3
  • 60
  • 59