0

I have an XML-based scripting language where I want to use something like <example if="x<5" />

The < is an illegal character for attributes, so XML parsers throw errors.

I still like to use it without escaping, it's nicer for humans to read and write x<=5than x&lt;=5.

So I'd like to pre-process the XML before it's being parsed and replace the invalid <with &lt;

But I can't figure out a regex. I've been trying for two hours now.

I've come as far ="(.*?)" to match everything in an attribute, but I just want that damn < and ="(<)"doesn't do it.

user3195878
  • 145
  • 6
  • 4
    http://stackoverflow.com/a/1732454/576139 – Chris Eberle Feb 05 '14 at 16:13
  • maybe you should use a more advanced xml editor, which doesn't require you to edit a text level. – BiAiB Feb 05 '14 at 16:20
  • 1
    I don't know the entire structure of your text, but [this](http://jsfiddle.net/ht8xP/) works for your provided example. Warning: This might not completely work since regex is not suitable to parse XML. – Jerry Feb 05 '14 at 16:28
  • @Chris I don't want to parse arbitrary HTML. – user3195878 Feb 05 '14 at 16:29
  • @user3195878 HTML ~= XML. Same thing in this case. You're trying to parse a non-regular language with a regular expression. – Chris Eberle Feb 05 '14 at 16:36
  • @Chris Aren't regexes almost always used to parse non-regular languages, like human language? Basically I just need to replace a < when it's not starting a tag. I've limited it to a < between =" and ". That should rule out tags. If I catch a couple of < anywhere else, fine, they must be escaped anyway. – user3195878 Feb 05 '14 at 16:44
  • @user3195878 no... you're confusing matching with parsing. Matching, yes. Parsing absolutely not. And your criteria of "not a starting tag" requires parsing. – Chris Eberle Feb 05 '14 at 17:05
  • @Chris It's a matter of pragmatism to me. The XML subset is tightly controlled, the XML-files are controlled and just a bunch of 'em. If every once in a while the regex matches a tag opening < and the parser produces an error, I can live with it. – user3195878 Feb 05 '14 at 17:55

1 Answers1

0

So first, the obligatory link telling you not to do this:
https://stackoverflow.com/a/1732454/505154

With that out of the way, here is how you can do what you are trying to do..

You can use a function for the replacement with the replace() method for String objects, so I would start by matching attributes using your current regex, and then replacing each < within each match. For example:

function escapeLtGt(str) {
    return str.replace('<', '&lt;').replace('>', '&gt;');
}

function escapeLtGtInXmlAttrs(str) {
    return str.replace(/="(.*?)"/g, escapeLtGt);
}

// example, logs '<example if="x&lt;5" />'
console.log(escapeLtGtInXmlAttrs('<example if="x<5" />'));
Community
  • 1
  • 1
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306