1

I am trying to search for an expression (var exp = "foo") in a big stringText that includes exp *but excludes everything starting with '<' and ends with '>'

right now I know how to write it like this:

    var regexp = new RegExp(exp, 'g');
    match = regexp.exec(stringText)

How do I write the exclude condition?

I know it should be /<.+>/g but how do I combine it? I know this isn't right but how do I do it?

    var regexp = new RegExp(exp + /<.+>/, 'g');

thanks, Alon

== UPDATE ===

I want to search for 'a' inside this string:

"a dog <span class="something"> had a  </span> and a cat"

I want it to hit the first 'a', the 'a' inside 'had', the 'a' after that and the 'a' in 'and' 'a' 'cat'

I dont want to get 'a' in 'span' or 'class' or everthing inside <>

leninmon
  • 89
  • 4
  • 11
Alon
  • 7,618
  • 18
  • 61
  • 99
  • It's not clear what you're actually looking for. You seem to be saying "I want to find A and exclude B", but how are A & B related? –  Feb 21 '12 at 23:22
  • Can you give an example of things you do and don't want to be matched? From the way I read it you want `foo` but not `` -- what about `< this is foo>`? – mathematical.coffee Feb 21 '12 at 23:22

3 Answers3

1

I'd argue you should look to something other than regex if you want to parse html/xml. Better men than I have explained why.

If you're hell bent on using regex or your problem doesn't warrant a more robust solution, I'd suggest doing something like this since JS doesn't have lookbehind:

var input = "a dog <span class='something'> had a  </span> and a cat";

// Remove anything tag-like
var temp = input.replace(/<.+?>/g, "");

// Perform the search
var matches = new RegExp(exp, "g").exec(temp);

Or the one liner:

var matches = new RegExp(exp, "g").exec(input.replace(/<.+?>/g, ""))
Community
  • 1
  • 1
Marcus Stade
  • 4,724
  • 3
  • 33
  • 54
0

What you want seem to be negative lookback and lookahead. Unfortunately, there is not lookback in javascript. Only lookahead.

E.g.

/foo(?!bar)/g 

will match foo in the foo fighters, but it will not match foo in the foobar solution

So you might want something like

/foo(?!>)/g

You can study lookahead on http://www.regular-expressions.info/lookaround.html#lookahead and javascript regexpes on http://www.regular-expressions.info/javascript.html

Xyz
  • 5,955
  • 5
  • 40
  • 58
  • but if I will look for "foo < foo span>" then both of them will be excluded because they both come after '>' (and I want the first foo to be okay) - can't I say: "include only if there is no <> or if there is an '>' that wasn't followed by '<' ? – Alon Feb 21 '12 at 23:37
  • no, lookahead works just like any other regular expressions. /foo(?!bar)/ will match the first and the last foo's in "foo bar foobar foo". /foo(?!.*bar)/ however will only match the last foo. – Xyz Feb 21 '12 at 23:43
0

Maybe what you want is most easily accomplished by at first removing (replacing with an empty string) all occurances of html-tags with a regexp such as this

var tmp = exp.replace(/<[^>]+>/g, '');

Then you can easily match your regexp looking for a's in your new temporary html-less string.

Xyz
  • 5,955
  • 5
  • 40
  • 58