0

My regex is for finding certain words in text, and not words inside elemental text.

REGEXP

RegExp('\\b([^<(.*?)>(.?+)<\/(.*?)>])(' + wregex.join('|') + ')\\b(?=\\W)

EXAMPLE

This is some text that should be looked through
though this text <code>Should not be looked at </code> and this text is ok to 
look at

So I'll explain my method of my regex Expression which I am having trouble with

([^<(.*?)>(.?+)<\/(.*?)>]) Do Not match any text that starts with <element> nothing inside here until this </element>

Thats the most important so I've tried multiple methods and not sure if this regex is possible. I don't want to match anything starting with a basic html element tag until the ending tag appears then start over searching.

EDIT I know that RegEx shouldn't be used to parse HTML this is looking through TEXT

Testing Example HERE

EasyBB
  • 6,176
  • 9
  • 47
  • 77
  • I have to post this, due to its relevance (once again): see [here](http://stackoverflow.com/a/1732454/2030691) for a discussion of why using regex to parse HTML is bad. – Xynariz Mar 07 '14 at 23:48
  • They were each different variable names. tregex wregex iregex or something like that. And Xynariz I know this I'm using this on text. – EasyBB Mar 08 '14 at 00:12

2 Answers2

0

Why crum everything in a single regex? It can be as simple as this. Notice that I'm using [^] instead of ., to also match newlines.

string.replace(/<[^]+?<\/[^]+?>/, '').match(/what i really want to find/gi)

And yes, this is prone to breakage, as any regex solution would be.

sabof
  • 8,062
  • 4
  • 28
  • 52
0

Assuming that the text you are searching over is correctly formed (as in, no tag mismatches) the following regex should work:

^([^<]*<([^>]*)>[^<]*</\2>)*[^<]Your Text

This insures that you text is outside of an open and closed set of tags by matching all open and closed sets before getting to your text.

It won't work for nested tags. Regex is incapable of parsing arbitrarily nested tags.

However, please remember, you should not parse html with regex

Community
  • 1
  • 1
Damien Black
  • 5,579
  • 18
  • 24
  • I'm aware not to parse HTML with Regex thats why I said text, it's all text format, which is what I was getting at lol. – EasyBB Mar 08 '14 at 00:07