1

I'm trying to find some words (or expression: like two words) in a string which are not in the anchor of a link (the string contains html code and is usually utf-8 encoded). The plan is to replace those words with some links after that.

I'm not really good with regex, i've searched the web and stackoverflow and found two regex patterns which help me, but each of them have an issue. I'm hoping someone can help me to combine those two example to get a good one.

First pattern: /('.$tag.')(?![^<]*<\/a>)/is

This pattern, finds the words, but if by example i'm trying to find "express" in the string:

In computing, a regular expression provides a concise and flexible means...

..i don't expect to find a match, however the match is found in the word "expression".

Second pattern: \'(?!((<.*?)|(<a.*?)))(\b'.$tag.'\b)(?!(([^<>]*?)>)|([^>]*?</a>))\'is

This pattern, doesn't have the previous issue, but if the word or expression, i'm trying to find has as a last character a special utf-8 character then i don't get a match.

Example word: apă

Example string: ...care transformă umiditatea din aer în apă potabilă. Dacă iniţial a fost creată pentru situaţia ţărilor...

Gabriel
  • 65
  • 1
  • 1
  • 5
  • 1
    possible duplicate of [Ignore html tags in preg_replace](http://stackoverflow.com/questions/8193327/ignore-html-tags-in-preg-replace) - And if it is UTF-8, use the `u` modifier as well: `...'isu`. – hakre May 13 '12 at 16:04

1 Answers1

0

Assuming the second regular expression works for you (I haven't tested it and I really don't think you should use regexes for this kind of stuff), all you need to do is add a u modifier like @hakre said:

\'(?!((<.*?)|(<a.*?)))(\b'.$tag.'\b)(?!(([^<>]*?)>)|([^>]*?</a>))\'isu

Personally, I'd use DOMDocument for this task.

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • It's working because this is what i currently use. The 'u' modifier, didn't solved my issue. As for the other option, i'm aware of it and eventually i will go this path, but i was hoping to have it done with regex, to prevent related changes i need to make in this case. – Gabriel May 13 '12 at 20:16