-1

I have a rich text like

Sample text for testing:<a href="http://www.baidu.com" title="leoshi">leoshi</a>leoshi for details balala...
Welcome to RegExr v2.1 by gskinner.com, proudly hosted by Media Temple!

What I want to match is the word leoshi, but not inside of <a> elements, So in this example it's only leoshi in leoshi for details....

Solution and explanation are welcome!

LeoShi
  • 1,637
  • 2
  • 14
  • 24

3 Answers3

1

A trick aimed to handle such "find a word but not a specific context" cases is described here: http://www.rexegg.com/regex-best-trick.html.

In essence it is: match your word in the undesired context or (using alternation) just this word but in a capture group. Then analyze the captures.

The regex in your case would be:

<a.*?>.*leoshi.*<\/a>|(leoshi)

Demo: https://regex101.com/r/zO0tV2/1

Then you need to check captures:

var input = "...";
var pattern = /<a.*?>.*leoshi.*<\/a>|(leoshi)/;
var match = pattern.exec(input);
var inputMatches = match !== null && match[1] !== null;

Demo: https://ideone.com/KkAl2I

Dmitry Egorov
  • 9,542
  • 3
  • 22
  • 40
0

I used a positive lookbehind to start the match AFTER the closing tag </a>. And then matched leoshi with a parentheses when it is used as a separate word.

Regex: (?<=<\/a>).*?\b(leoshi)

DEMO

Anshul Rai
  • 772
  • 7
  • 21
-1

The best approach (using regex) would be to first remove all of the tags, then detect the word in the remaining string. For example:

var str_without_links = str.replace(/<a\b.*?<\/a>/, '')
str_without_links.match(/leoshi/)

If you need to preserve the string length (for correspondence with the original string), consider using placeholder characters in place of the original tag.

var str_without_links = str.replace(/<a\b.*?<\/a>/, function(s) { return s.replace(/./g, ' ') })
Owen
  • 1,527
  • 11
  • 14
  • no man, we can't do things like this. this approach will destroy the original text. – LeoShi Aug 01 '16 at 11:03
  • ```replace``` returns a copy, the original is intact. If you need to keep the original text (e.g. to determine the correct index of the matched text) then you would need to replace each matched character with a placeholder instead. I will edit my answer to include an example. – Owen Aug 01 '16 at 11:05