0

I need some advice in the subject of regular expressions in js.

I have a string

var page = '<html attribute1="<test1>" test2 attribute2="test2"></html>';

I'm trying to get fragment

<html attribute1="<test1>" test2 attribute2="test2">

But my code:

page.match(/<.*?>/);

returns only chars to first occurrence of ">", therefore

<html attribute1="<test1>

What should I do to ignore the symbols ">" appearing between quotes? Please help me and sorry for my English ;)

  • 2
    Parsing things like that requires a more powerful parsing mechanism than that afforded by regular expressions. You can concoct hacks to deal with restricted cases, but your example here is a good one to illustrate the difficulty. – Pointy Oct 11 '13 at 18:29
  • 2
    refer to this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – staafl Oct 11 '13 at 18:30
  • 3
    Browser manufacturers spend months coding query selectors, DOM traversal, data attributes, and countless other features...and this is how people choose to design things, instead? *sigh* – Katana314 Oct 11 '13 at 18:33
  • What are you _really_ trying to solve here? Why do you need this `html` tag? What's the bigger issue being tackled? – Benjamin Gruenbaum Oct 11 '13 at 18:36
  • why are there angle brackets in the attrib? that ain't right... – dandavis Oct 11 '13 at 18:39

3 Answers3

0

You might try this regex:

^<(?:"[^"]*"|[^>])+>

regex101 demo.

Which will match either double quotes and anything inside or non > till the first >.

And maybe use this one if you also have single quotes:

^<(?:"[^"]*"|'[^']*'|[^>])+>
Jerry
  • 70,495
  • 13
  • 100
  • 144
0

You can try this page.match(/<.*">/);

Mina
  • 1,508
  • 1
  • 10
  • 11
0

page.match(/\<.*\>(?=<\/)/); should do the trick.

The pattern you're using is picking up on .*? as a lazy match and ending after finding the first instance of >. Removing the ? from the pattern turns it into a greedy match, continuing the search until the final > and returning the full string.

I've also added a lookahead to end the match before the closing html tag and added backslashes to escape symbols.