1

I use preg_replace function that has to clear all the rel and target attributes from the $body and to replace them with other values. Here it is:

    $patterns = array();
    $patterns[] = '/<a(.*) rel="([^"]*)"(.*)>/';
    $patterns[] = '/<a(.*) target="([^"]*)"(.*)>/';
    $patterns[] = '/<a(.*)>/';

    $replacements = array();
    $replacements[] = '<a$1$3>';
    $replacements[] = '<a$1$3>';
    $replacements[] = '<a rel="nofollow" target="_blank"$1>';

    $body = preg_replace($patterns,$replacements,$body);

The problem is that it does not match single and no-quotes. Also if there is any better approach for clearing the rel and target attributes from the links and setting them with others, please advise.

Thanks

EDIT: $body:

    $body = '<a href="TEST">Link1</a>
      <a href="TEST" rel=\'lqlqlq\'>Link2</a>
      <a href="TEST" target="_blank" rel="lqlqlq">Link3</a>
              <a href="TEST" target=_blank rel=lqlqlq>Link4</a>';

And also every other option for working links due to the text is user defined and probably some users are going to cheat. My goal - all of the links in $body to be with defined rel and target attributes no matter what the user has entered.

Constantin.FF
  • 687
  • 1
  • 10
  • 23
  • Can you provide a fragment of the source text? – Maks3w Sep 13 '12 at 07:54
  • in regex you can write ["']? which means double, single quotes both optional – Maks3w Sep 13 '12 at 07:55
  • 1
    You can use SimpleXML or other XML library to scan your source and getting the elements more cleanly – Maks3w Sep 13 '12 at 07:56
  • Looks like you're attempting to parse HTML using regex, [read this](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) and realize you're doing _EVIL_ things – Elias Van Ootegem Sep 13 '12 at 08:08

3 Answers3

1

If you don't want to match anything else at the same time I suggest you don't and preserve system resources.

$patterns[] = "/rel=[\"\']?([\w]+)[\"\']?/";
$patterns[] = "/target=[\"\']?([_a-zA-Z]+)[\"\']?/"

I'm not the best at REGEX but as far as I know this will save you some time.

Daniel
  • 3,726
  • 4
  • 26
  • 49
0

Most likely it matches them but due to the fact that <a(.*)> will always match no matter what the first two did, you won't get any results from the first two pattern.

This could do what you're looking for:

$patterns = array();
$patterns[] = '/<a(.*) rel=[\'"]?\S+[\'"]? (.*)>/';
$patterns[] = '/<a(.*) target=[\'"]?\S+[\'"]? (.*)>/';

$replacements = array();
$replacements[] = '<a$1$3>';
$replacements[] = '<a$1$3>';

$body = preg_replace($patterns,$replacements,$body);

Cheers.

pagid
  • 13,559
  • 11
  • 78
  • 104
0

this expression will handle 3 options:

  1. no quotes
  2. double quotes
  3. single quotes

'/href=["\']?([^"\'>]+)["\']?/'

ishubin
  • 303
  • 3
  • 2