-3

I've been hoving around by some answers here, and I can't find a solution to my problem:

I have this regexp which matches everyting inside an HTML span tag, including contents:

<span\b[^>]*>(.*?)</span>

and I want to find a way to make a search in all the text, except for what is matched with that regexp. For example, if my text is:

var text = "...for there is a class of <span class="highlight">guinea</span> pigs which..."

... then the regexp would match:

<span class="highlight">guinea</span>

and I want to be able to make a regexp such that if I search for "class", regexp will match "...for there is a class of..." and will not match inside the tag, like in

"... class="highlight"..."

The word to be matched ("class") might be anywhere within the text. I've tried

(?!<span\b[^>]*>(.*?)</span>)class

but it keeps searching inside tags as well. I want to find a solution using only regexp, not dealing with DOM nor JQuery. Thanks in advance :).

Luxedrina
  • 5
  • 2
  • http://blog.codinghorror.com/parsing-html-the-cthulhu-way/ – Etheryte Nov 28 '14 at 02:00
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) –  Nov 28 '14 at 02:24
  • You can't do that. Instead, do the search on individual text nodes in the DOM. –  Nov 28 '14 at 02:25
  • @Nit: I understand this is for fair complex and arbitrary HTML. Mine is as simple and predictable as the one I posted. Thank you though. – Luxedrina Dec 01 '14 at 18:58
  • @ torazaburo: Not the same case. I just need to avoid tags like the one I posted, not some tags yes and some not :). I don't know apeshiagt about DOM and I thought that searching in the complement of a well-defined set would do the trick, but apparently I'm touching sensitive cords here ;). There there. Thanks though. – Luxedrina Dec 01 '14 at 19:00

2 Answers2

0

Although I wouldn't recommend this, I would do something like below

(class)(?:(?=.*<span\b[^>]*>))|(?:(?<=<\/span>).*)(class)

You can see this in action here

Rubular Link for this regex

You can capture your matches from the groups and work with them as needed. If you can, use a HTML parser and then find matches from the text element.

pogo
  • 1,479
  • 3
  • 18
  • 23
0

It's not pretty, but if I get you right, this should do what you wan't. It's done with a single RegEx but js can't (to my knowledge) extract the result without joining the results in a loop.

The RegEx: /(?:<span\b[^>]*>.*?<\/span>)|(.)/g

Example js code:

var str = '...for there is a class of <span class="highlight">guinea</span> pigs which...',
    pattern = /(?:<span\b[^>]*>.*?<\/span>)|(.)/g,
    match,
    res = '';


match = pattern.exec(str)
while( match != null )
{
    res += match[1];
    match = pattern.exec(str)
}

document.writeln('Result:' + res);

In English: Do a non capturing test against your tag-expression or capture any character. Do this globally to get the entire string. The result is a capture group for each character in your string, except the tag. As pointed out, this is ugly - can result in a serious number of capture groups - but gets the job done.

If you need to send it in and retrieve the result in one call, I'd have to agree with previous contributors - It can't be done!

SamWhan
  • 8,296
  • 1
  • 18
  • 45