3

Im using preg_replace to replace keywords in text with a href tag, my regex is working awesome, right now my code is:

$newstring2 = preg_replace("/\p{L}*?".preg_quote($match[$i])."\p{L}*/ui", "<a href='".$url."' class='link'>$0</a>", $newstring);

Only problem with this is, that I need to exclude any keywords inside <a href='https://keyword.cz' title="keyword">keyword</a>

This is what I found https://stackoverflow.com/a/22821650/4928816

So is here someone who can help me merge this two regex together?

Example:

$text = 'this is sample text about something what is text.'
$keyword = 'text'

Now thanks to my regex I get:

$text= 'this is sample <a href='somelink.php'>text</a> about something what is <a href='somelink.php'>text</a>.'

But If text is :

$text= 'this is sample <a href='text.php'>text</a> about something what is <a href='somelink.php'>text</a>.'

This is what for example I get:

$text= 'this is sample <a href='<a href='somelink.php'>text.php</a>'><a href='somelink.php'>text</a></a> about something what is <a href='somelink.php'><a href='somelink.php'>text</a></a>.'

Update: Why do I need this. Working on function to replace all keywords with specific URL in specific blog post full of tags.. For examle if

$keyword = 'key';

I need to find and replace full world with a href tag, for example: Key, Keyword, keyword, keylock, mykey, keys or also KeY, Keyword with UNICODE support

Zdenek Slavik
  • 142
  • 3
  • 13
  • Can you show us the expected outputs for some given inputs? – Cid Nov 20 '18 at 15:22
  • 3
    Is there a [reason](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) you're not using a DOM parser? – CD001 Nov 20 '18 at 15:22
  • Entire HTML parsing is not possible with regular expressions, since it depends on matching the opening and the closing tag which is not possible with regexps. It should be possible to present a HTML file that will be matched wrongly by any regular expression. – Bogdan N. Nov 20 '18 at 15:30
  • See also: [PHP Regular expression to match keyword outside HTML tag](http://stackoverflow.com/q/7798829), [Regex ignore URL already in HTML tags](http://stackoverflow.com/q/9567836) and [php regex to match outside of html tags](http://stackoverflow.com/q/7891771) – mario Nov 20 '18 at 15:31
  • Ok @Cid I added how Im using this, – Zdenek Slavik Nov 20 '18 at 16:41
  • Yes I want count occurences, but If you have tip how to solve this with DOM you are very welcome. @CD001 I added example, how Im doing this. – Zdenek Slavik Nov 20 '18 at 16:44

2 Answers2

4

If it must be done with regex I think PCRE verbs are your best option. Exclude all links then search for the term with word boundaries.

<a[\S\s]+?<\/a>(*SKIP)(*FAIL)|\bTERM\b

Demo: https://regex101.com/r/KlE1kc/1/

an example of a flaw with this though is if the a ever had a </a> in it. e.g. onclick='write("</a>")' a parser is really the best approach. There are a lot of gotchas with HTML and regexs.

user3783243
  • 5,368
  • 5
  • 22
  • 41
3

How about this with negative lookahead. Regex

Explanation: capture all the keyword that is called text and replace with it some link but don't capture those keywords that have </a> after it.

$re = '/(text)(?!<\/a>)/m';
$str = 'this is sample text about something what is text.

this is sample <a href=\'somelink.php\'>text</a> about something what is <a href=\'somelink.php\'>text</a>.';
$subst = '<a href=\'somelink.php\'>$1</a>';

$result = preg_replace($re, $subst, $str);

echo $result;

Output:

this is sample <a href='somelink.php'>text</a> about something what is <a href='somelink.php'>text</a>. 

this is sample <a href='somelink.php'>text</a> about something what is <a href='somelink.php'>text</a>.

DEMO: https://3v4l.org/DVTB1

A l w a y s S u n n y
  • 36,497
  • 8
  • 60
  • 103
  • This has the same effect though if an attribute has the value, https://regex101.com/r/oRVBvi/2 – user3783243 Nov 20 '18 at 16:09
  • that may be work, but this preg_replace must exclude href and match any FULL keyword, with no difference. between Keyword Keywords, keyword mykeyword. So do you think, that what you post can be used in regex I gave above? I need something like this: $newstring1 = preg_replace("/\p{L}*?/(text)(?!<\/a>)\p{L}*/ui", "$0", $newstring); – Zdenek Slavik Nov 20 '18 at 16:13