0

Currently working on something where i need to add the UTM tag to all links, got 1/2 minor issues i cant figure out

This is the code im am using, the issue is if a link got a parameter like ?test=test then this refuses to add the utm tags.

The other issue is a minor issue that im not sure would make sence to change, insted of me having to add a url, it could be neat if it added utm tags to ALL a href's by default with out knowing the domain name.

Hope someone can help me out and push me in the right direction.

$url_modifier_domain = preg_quote('add-link.com');

$html_text = preg_replace_callback(
    '#((?:https?:)?//'.$url_modifier_domain.'(/[^\'"\#]*)?)(?=[\'"\#])#i',
    function($matches){
        $url_modifier = 'utm=some&medium=stuff';
        if (!isset($matches[2])) return $matches[1]."/?$url_modifier";
        $q = strpos($matches[2],'?');
        if ($q===false) return $matches[1]."?$url_modifier";
        if ($q==strlen($matches[2])-1) return $matches[1].$url_modifier;
        return $matches[1]."&$url_modifier";
    },
    $html);
  • 1
    you may need to go with DOM instead of RegExp , https://stackoverflow.com/a/11235611/2359679 – hassan Oct 23 '17 at 07:21
  • what does your code do, and what it doesn't do? – madalinivascu Oct 23 '17 at 07:21
  • The code add's ?utm=some&medium=stuff to urls = add-link.com The issue is it dossent ad the utm tag if the url ex is add-link.com?test=test Where i need it to do add-link.com?test=test&utm=some&medium=stuff –  Oct 23 '17 at 08:22

1 Answers1

1

once detected the urls you can use parse_url() and parse_str() to elaborate the url, add utm and medium and rebuild it without caring too much about the content of the get parameters or the hash:

$url_modifier_domain = preg_quote('add-link.com');

$html_text = preg_replace_callback(
    '#((?:https?:)?//'.$url_modifier_domain.'(/[^\'"\#]*)?)(?=[\'"\#])#i',
    function ($matches) {
        $link = $matches[0];
        if (strpos($link, '#') !== false) {
            list($link, $hash) = explode('#', $link);
        }
        $res = parse_url($link);

        $result = '';
        if (isset($res['scheme'])) {
            $result .= $res['scheme'].'://';
        }
        if (isset($res['host'])) {
            $result .= $res['host'];
        }
        if (isset($res['path'])) {
            $result .= $res['path'];
        }
        if (isset($res['query'])) {
            parse_str($res['query'], $res['query']);
        } else {
            $res['query'] = [];
        }

        $res['query']['utm'] = 'some';
        $res['query']['medium'] = 'stuff';

        if (count($res['query']) > 0) {
            $result .= '?'.http_build_query($res['query']);
        }
        if (isset($hash)) {
            $result .= '#'.$hash;
        }

        return $result;
    },
    $html
);

As you can see, the code is longer but simpler

Edit I made some change, searching for every href="xxx" inside the text. If the link is not from add-link.com the script will skip it, otherwise he will try to print it in the best way possible

$html = 'blabla <a href="http://add-link.com/">a</a>
<a href="http://add-link.com/">a</a>
<a href="http://add-link.com/#hashed">a</a>
<a href="http://abcd.com/#hashed">a</a>
<a href="http://add-link.com/?test=1">a</a>
<a href="http://add-link.com/try.php">a</a>
<a href="http://add-link.com/try.php?test=1">a</a>
<a href="http://add-link.com/try.php#hashed">a</a>
<a href="http://add-link.com/try.php?test=1#hashed">a</a>
<a href="http://add-link.com/try.php?test=1#hashed">a</a>
<a href="//add-link.com?test=test" style="color: rgb(198, 156, 109);">a</a>
';

$url_modifier_domain = preg_quote('add-link.com');

$html_text = preg_replace_callback(
    '/href="([^"]+)"/i',
    function ($matches) {
        $link = $matches[1];

    // ignoring outer links
    if(strpos($link,'add-link.com') === false) return 'href="'.$link.'"';

        if (strpos($link, '#') !== false) {
            list($link, $hash) = explode('#', $link);
        }
        $res = parse_url($link);

        $result = '';
        if (isset($res['scheme'])) {
            $result .= $res['scheme'].'://';
        } else if(isset($res['host'])) {
       $result .= '//';
    }

        if (isset($res['host'])) {
            $result .= $res['host'];
        }
        if (isset($res['path'])) {
            $result .= $res['path'];
        } else {
        $result .= '/';
    }

        if (isset($res['query'])) {
            parse_str($res['query'], $res['query']);
        } else {
            $res['query'] = [];
        }

        $res['query']['utm'] = 'some';
        $res['query']['medium'] = 'stuff';

        if (count($res['query']) > 0) {
            $result .= '?'.http_build_query($res['query']);
        }
        if (isset($hash)) {
            $result .= '#'.$hash;
        }

        return 'href="'.$result.'"';
    },
    $html
);

var_dump($html_text);
Roberto Bisello
  • 1,235
  • 10
  • 18
  • Hi Roberto, What about if the url got a parameter ?? Ex i have a url thats called add-link.com?test=test and it ignores that totally and dont add to it. –  Oct 23 '17 at 11:46
  • $res['query'] contains an array of all the query params so none of them will be lost. I've tried the script wihout any issue with this variants: http://add-link.com/, http://add-link.com/#hashed, http://add-link.com/try.php, http://add-link.com/try.php?test=1, http://add-link.com/try.php#hashed, http://add-link.com/try.php?test=1#hashed, http://add-link.com/try.php?test=test&try=onother#hashed – Roberto Bisello Oct 23 '17 at 11:53
  • It ignore this one –  Oct 23 '17 at 12:09
  • Figured out what triggers it, the missing / ad the end of the link, could a simple method be added to make sure / is added after the main link if it's a direct link? –  Oct 23 '17 at 12:14
  • it depends on how you create your link, if they are hand-written you can make another regexp to fix him or change the previews regexp in order to me less restrictive – Roberto Bisello Oct 23 '17 at 12:20
  • Im not good with regexp i got to say hehe :D But is there a simple way to do it with regexp that adds / to all links thats missing it? after the .xx –  Oct 23 '17 at 12:24
  • the links are normally copy paste, and it aint every time it adds / to the end of a "clean link" –  Oct 23 '17 at 12:42
  • i've made some change, but you have to consider that href="add-link.com" is also wrong as it points to a file called "add-link.com" inside you domain and as this, there can be tons of errors which are hard to find and fix, so pay attention while pasting ;) – Roberto Bisello Oct 23 '17 at 12:52
  • Ill try and test it out ohh the domain include http or https always :) it's just the ending of it that's more variable. –  Oct 24 '17 at 11:53