0

How does one go about using grouping in Regular Expression to find html elements? The issue is when searhing a file, the < and > from javascript are found, so I want to always start with a < and end with either a > or />, or start with , then I want to look for stuff inside of that. The key thing I am looking for is capital letters in element names or attribute names.

The reason I am asking is because I have been tasked with going through all the JSP pages and changing the uppercase elements and tag names to lower case. I am simply trying to find a regular expression that is more accurate than what I am currently using. The main issue seems to be that this regular expression does not know the difference between a < or > in Javascript and in HTML.

<[^!>%]*(([A-Z]{1,})|([^A-Za-z"'>=&\?][A-Z]{1,}[a-z]*))[\s=>]
Sam Carleton
  • 1,339
  • 7
  • 23
  • 45
  • 1
    Use an XML parser instead. Regex cannot reliably parse HTML. See this famous SO answer: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Ennui Oct 02 '13 at 14:03
  • Unless someone comes along and prove me wrong I'm going to say that it's not possible due to the exact reason you mentioned. You, as a human, know that `
    ` and the other to parse html outside of script tags.
    – slebetman Oct 02 '13 at 15:05
  • Normally, I would be the one to say "With a small enough problem set, Regular Expressions can be successfully used to extract information from (X(HT)?|HT)ML" This is not one of those cases. You really need a parser, because you require an understanding of the language, and (X(HT)?|HT)ML is not a regular language, which means Regular expressions cannot successfully describe it. – FrankieTheKneeMan Oct 02 '13 at 15:14

0 Answers0