Using grouping to finding HTML elements via regular expression

Question

How does one go about using grouping in Regular Expression to find html elements? The issue is when searhing a file, the < and > from javascript are found, so I want to always start with a < and end with either a > or />, or start with , then I want to look for stuff inside of that. The key thing I am looking for is capital letters in element names or attribute names.

The reason I am asking is because I have been tasked with going through all the JSP pages and changing the uppercase elements and tag names to lower case. I am simply trying to find a regular expression that is more accurate than what I am currently using. The main issue seems to be that this regular expression does not know the difference between a < or > in Javascript and in HTML.

<[^!>%]*(([A-Z]{1,})|([^A-Za-z"'>=&\?][A-Z]{1,}[a-z]*))[\s=>]

Use an XML parser instead. Regex cannot reliably parse HTML. See this famous SO answer: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Ennui, Oct 02 '13 at 14:03
Unless someone comes along and prove me wrong I'm going to say that it's not possible due to the exact reason you mentioned. You, as a human, know that `
` and the other to parse html outside of script tags. — slebetman, Oct 02 '13 at 15:05
Normally, I would be the one to say "With a small enough problem set, Regular Expressions can be successfully used to extract information from (X(HT)?|HT)ML" This is not one of those cases. You really need a parser, because you require an understanding of the language, and (X(HT)?|HT)ML is not a regular language, which means Regular expressions cannot successfully describe it. — FrankieTheKneeMan, Oct 02 '13 at 15:14

Using grouping to finding HTML elements via regular expression

0 Answers0