1

I have a regex that removes xmlns references from XML. It works fine when there are matching tags, but if the the xmlns reference is in a single tag it removes "/" as well.

Here is the regex:

"<(.*?) xmlns[:=].*?>", "<$1>"

When I use the regex on this line of xml:

<ns22:someTagName xmlns:ns22="http://exampledatatypes.com"></ns22:someTagName>

I get what I want:

<ns22:someTagName></ns22:someTagName>

When I use the regex on this line of xml:

<ns22:someTagName xmlns:ns22="http://exampledatatypes.com"/>

I get this invalid XML:

<ns22:someTagName>

It removes the reference fine, but it takes the closing "/" with it.

Thanks for the help, Scott

Scott
  • 47
  • 1
  • 1
  • 6
  • 4
    Don't use regex for XML. What programming language are you using? Undoubtedly there is a superior XML API that would allow you remove namespaces easily. – Andrew Hare Feb 24 '11 at 16:28

3 Answers3

6

Rather than trying to preserve what you need from the XML it would be better to target what you want to remove.

This expression targets just the namespace itself:

\sxmlns[^"]+"[^"]+"

Unfortunately I don't know LotusScript so I can't give you a code sample of how to use this but what you need to do is something like this psuedocode:

result = regex.replace(yourString, '\sxmlns[^"]+"[^"]+"', '')

What you will do here is replace all matches with an empty string (effectively removing them). This will work for both a closed and self-closed XML tag and it will also work if the tag doens't have a namespace at all.

Edit: Here is a fully-functional Python example:

>>> from re import sub
>>> pattern = r'\sxmlns[^"]+"[^"]+"'
>>> closed = r'<ns22:someTagName xmlns:ns22="http://exampledatatypes.com"></ns22:someTagName>'
>>> sub(pattern, '', closed)
'<ns22:someTagName></ns22:someTagName>'
>>> selfclosed = r'<ns22:someTagName xmlns:ns22="http://exampledatatypes.com"/>'
>>> sub(pattern, '', selfclosed)
'<ns22:someTagName/>'
Andrew Hare
  • 344,730
  • 71
  • 640
  • 635
  • Hmmm. I tried it and it didn't seen to do anything. In LS you need to escape " with another ". Here is what i tried: ExecuteReplace(sXML, "xmlns[^""]+""[^""]+""", "") – Scott Feb 24 '11 at 17:15
  • Hi Andrew, I got it to work, but it leaves whitespace in the tag where the reference is removed. Is there a way to clear the whitespace out? ExecuteReplace(sXML, "xmlns[^""]+""[^""]+""", "") – Scott Feb 24 '11 at 19:09
  • @Scott - I changed the expression to this: `\sxmlns[^"]+"[^"]+"` to handle the whitespace issue. – Andrew Hare Feb 24 '11 at 19:44
  • I added a space in front of xmlns and it fixed it: ExecuteReplace(sXML, " xmlns[^""]+""[^""]+""", ""). Thanks for the help. – Scott Feb 24 '11 at 19:49
1

Don't use regex on XML if you have access to an XML parser! That being said, I don't know anything about LotusScript's XML parsing capabilities (if it even has them), so if you must use regex, this will get you closer:

<([^>]*?)\bxmlns\b[^"']+('|").*?$2(.*?/?>)

to be replaced with:

<$1$3

The most important change here from your original regex is the /? toward the end. BTW, I haven't escaped the qoutes or backslashes since I don't know LotusScript syntax for that, and I assume you do.

There will always be XML-valid input that cannot be properly understood by this, due to the limitations of regex. However, it should work for most cases. You could double-check manually by searching for the string "xmlns" afterward.

Community
  • 1
  • 1
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
0

regex \s*xmlns(:\w+)?="[^"]*" can remove both implicit / named xmlns.

In Java, xmlString.replaceFirst("\\s*xmlns(:\\w+)?=\"[^\"]*\"", "")

https://regexr.com/ is a great tool to use for writing/testing these.

jediz
  • 4,459
  • 5
  • 36
  • 41