1

I'm new in regular expression using this website but facing a problem. Scenario is there are some elements in angle brackets and every element has an attribute, After every element a period (.) symbol is given like this.

<a value = "GoodVal">.<b value = "BadVal" size = "10">.<c height = "auto">.<d size = "3">.<e strength = "200%">.<f a1 = "1" a2 = "2" a3 = "3"></f></e></d></c></b></a>

My Expression is:<a.*?>\.<b.*?>\.<d.*?> but why its consider as a matched. In sentence after b element c element is written not d element.

1 Answers1

1

First of all, please see here for why not to use regular expressions to parse XML/HTML.

But to still answer your question: The . matches (almost) any character (but not line breaks by default without the appropriate modifier), that's why .* matches everything between the closing bracket of your b element and the starting bracket of your d element.

It's always a good idea to use a page like http://www.regextester.com/?fam=96920 to visualize your expressions, especially if you're new to working with RegEx.

To only include the tags of a, b, and d, as you requested, you can use this regex:

/<[abd].*?\./g

See also on regextester. The g modifier is needed in JavaScript to capture all matches. You could also match all tags and then, while iterating over them, decide which ones to keep and which ones not.

Community
  • 1
  • 1
Constantin Groß
  • 10,719
  • 4
  • 24
  • 50