[EDIT]:
I Understand that regular expression are not made to parse XML, but my question is about WHY THAT REGULAR EXPRESSION DOESN'T COMPILE IN PYTHON.
I'm expecting answers about WHAT DOSN'T WORK in that regular expression and NOT WHY IT IS NOT A GOOD IDEA TO USE IT (and I don't understand the down votes).
[/EDIT]
I'm trying to write a function that escapes the contend of an XML tag according to this document and I think that the best solution is to escape all the "<" and "&" that are not in a CDATA section.
I have a basic knowledge of regular expressions, so I looked around and I found this page and this one.
So apparently the regular expression that works with the "&" is:
&(?!(?:[a-zA-Z][a-zA-Z0-9]*|#\d+);)(?!(?>(?:(?!<!\[CDATA\[|\]\]>).)*)\]\]>)
but it doesn't work in python, in fact if I try to use it I have:
In [1]: import re
In [2]: x = re.compile('&(?!(?:[a-zA-Z][a-zA-Z0-9]*|#\d+);)(?!(?>(?:(?!<!\[CDATA\[|\]\]>).)*)\]\]>)')
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-2-2884ec1d2f4e> in <module>()
----> 1 x = re.compile('&(?!(?:[a-zA-Z][a-zA-Z0-9]*|#\d+);)(?!(?>(?:(?!<!\[CDATA\[|\]\]>).)*)\]\]>)')
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/re.pyc in compile(pattern, flags)
188 def compile(pattern, flags=0):
189 "Compile a regular expression pattern, returning a pattern object."
--> 190 return _compile(pattern, flags)
191
192 def purge():
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/re.pyc in _compile(*key)
243 p = sre_compile.compile(pattern, flags)
244 except error, v:
--> 245 raise error, v # invalid expression
246 if len(_cache) >= _MAXCACHE:
247 _cache.clear()
error: unexpected end of pattern
and this make me think that that regular expression is not written for python.
Any help?