0

Ok, so I have a string (it's the contents of an email), and I need to append a variable to any URL that is present on that string. we can consider that all URLs are inside of href attributes of anchor tags So, I want to search for any occurence of href="whatever" and replace it for href="whatever?myvar". Ideally, I would also like to check if the link already has any var in it, in order to append it with "&myvar" instead of "?myvar".

I have something like this, but I get lost with regex expressions..

$pattern = '"\b(http?://\S+)"';
$html_links = preg_replace($pattern, '$1&myvar', $text);

this ain't working because it is appending my var AFTER the closing double quote for the href attribute...

Sorry, I'm so bad with regex. Any help will be highly appreciated!

germi
  • 95
  • 5
  • Maybe there is no need to use regex for url replacing, see http://php.net/manual/en/function.parse-url.php. Also http://php.net/manual/en/function.parse-str.php for query string parsing. And http://php.net/manual/en/function.http-build-query.php for convert parsed query array back to string. – shtrih Oct 16 '18 at 14:53
  • Use `preg_match_all` with one of the regex from [this page](https://stackoverflow.com/questions/910912/extract-urls-from-text-in-php). Once you have the urls, you can change them and use `str_replace` to replace them in the string. – Michel Oct 16 '18 at 15:07
  • sorry guys, that doesn't work for me. again, I'm very very bad with regex. there is definitely the need to use regex. but I can't see what you mean, @Michel how would the final result look like? – germi Oct 16 '18 at 15:50

2 Answers2

0

You could try this, which works for the test data I gave it.

<?
$text = '<a href="http://whatever">whatever</a> <a href=\'http://whatever?somevar=1\'>something</a>';
$pattern1 = '/([\'"])(https?:\/\/[^\1?]+)\1/';
$pattern2 = '/([\'"])(https?:\/\/[^\1]+\?[^\1]+)\1/';
$html_links = preg_replace($pattern2, '$1$2&amp;myvar$1', $text);
$html_links = preg_replace($pattern1, '$1$2?myvar$1', $html_links);

var_dump($html_links);

Explanation:

$pattern1 = '/([\'"])(https?:\/\/[^\1?]+)\1/';

([\'"]) Quote mark

(https?:\/\/[^\1?]+) http followed by an optional s followed by as much as possible until the matching quote mark

\1 Closing quote backreference

$pattern2 = '/([\'"])(https?:\/\/[^\1]+\?[^\1]+)\1/';

(https?:\/\/[^\1]+\?[^\1]+) As above, but requiring a ?

Chris Lear
  • 6,592
  • 1
  • 18
  • 26
0

Generally it is a bad idea to use regex to parse any kind of HTML (here is why). It's better to use PHP's build in Dom parser. Here's how you could do it :

//SET YOUR variable
$myvar='MYVAR=I WANT BEER';
//GET THE DOM
$dom = new DOMDocument('1.0','UTF-8');
$iEr = libxml_use_internal_errors(true);
$dom->loadHTML($text);
libxml_use_internal_errors($iEr);

//LOOK FOR <A HREFS=
foreach ($dom->getElementsByTagName('a') as $node) {
    if($node->hasAttribute('href')){
        $href=$node->getAttribute('href');

        //look for query parts
        $query = parse_url($href, PHP_URL_QUERY);

        //if no query part, add ?
        if($query===NULL) $new_link = $href.'?';
        // if there is a query part, add &
        else $new_link = $href.'&';

        //add your own variable
        $new_link.=$myvar;

        //replace the old link with the new one
        $node->setAttribute('href',$new_link);
    }
}
//SAVE THE NEW DOM 
$new_text=$dom->saveHtml();

As of why the use of libxml_use_internal_errors, take a look here

Michel
  • 4,076
  • 4
  • 34
  • 52