I have a large number (>1500) of JSP files that I am trying to convert to JSPX. I am using a tool that will parse well-formed JSPs and convert to JSPX, however, my JSPs are not all well-formed :)
My solution is to pre-process the JSPs and convert untidy code so the tool will parse them correctly. The main problem I am trying to resolve is that of unquoted attribute values. Examples:
<INPUT id="foo" size=1>
<input id=body size="2">
My current regex for finding these is (in Java string format):
"(\\w+)=([^\"' >]+)"
And my replacement string is (in Java string format):
"$1=\"$2\""
This works well, EXCEPT for a few patterns, both of which involve inline scriptlets. For example:
<INPUT id=foo value="<%= someBean.method("a=b") %>">
In this case, my pattern matches the string literal "a=b", which I don't want to do. What I'd like to have happen is that the regex would IGNORE anything between <% and %>. Is there a regular expression that will do what I am trying to do?
EDIT: Changed to title to clarify that I am NOT trying to parse HTML / JSP with regexes... I am doing a simple syntactic transformation to prepare the input for parsing.