0

Currently I have the following code:

    //loop here 
    foreach ($doc['a'] as $link) {
        $href = pq($link)->attr('href');                
        if (preg_match($url,$href))
        {
            //delete matched string and append custom url to href attr
        }       
        else
        {
            //prepend custom url to href attr
        }
    }
    //end loop

Basically I've fetched vial curl an external page. I need to append my own custom URL to each href link in the DOM. I need to check via regex if each href attr already has a base url e.g. www.domain.com/MainPage.html/SubPage.html

If yes, then replace the www.domain.com part with my custom url.

If not, then simply append my custom url to the relative url.

My question is, what regex syntax should I use and which php function? Is preg_replace() the proper function for this?

Cheers

jc.yin
  • 179
  • 1
  • 2
  • 12

1 Answers1

2

You should use internals as opposed to REGEX whenever possible, because often the authors of those functions have considered edge cases (or read the REALLY long RFC for URLs that details all of the cases). For you case, I would use parse_url() and then http_build_url() (note that the latter function needs PECL HTTP, which can be installed by following the docs page for the http package):

$href = 'http://www.domain.com/MainPage.html/SubPage.html';
$parts = parse_url($href);

if($parts['host'] == 'www.domain.com') {
    $parts['host'] = 'www.yoursite.com';

    $href = http_build_url($parts);
}

echo $href; // 'http://www.yoursite.com/MainPage.html/SubPage.html';

Example using your code:

foreach ($doc['a'] as $link) {
    $urlParts = parse_url(pq($link)->attr('href'));               

    $urlParts['host'] = 'www.yoursite.com'; // This replaces the domain if there is one, otherwise it prepends your domain

    $newURL = http_build_url($urlParts);

    pq($link)->attr('href', $newURL);
}
Bailey Parker
  • 15,599
  • 5
  • 53
  • 91
  • Actually I just thought of something. My custom url isn't static, i.e it will depend on user input and be stored in a variable. Will preg_replace be able to take a url stored in a variable, compare it with another url and replace the matching url with my own url? – jc.yin May 05 '13 at 03:12
  • It doesn't need to be static to work with this. You can use this with your `foreach` loop. Let me reiterate that I would recommend *against* using `preg_replace()`. – Bailey Parker May 05 '13 at 03:14
  • I just carefully re-read your answer and wow that is really what I need! haha sorry my bad, I must be too tired from too much coding. I'll try out the method asap now and get back as soon as I can :) – jc.yin May 05 '13 at 03:19
  • I'm trying to get the PECL HTTP extension and on the php manual site it only explains how to install it for windows. I'm using a mac and I read here http://stackoverflow.com/questions/5536195/install-pecl-on-mac-os-x-10-6 that I should download and install PEAR? I've never installed any php extensions before, would you have any suggestions how I could get the PECL HTTP on mac? – jc.yin May 05 '13 at 03:25
  • Have you checked out: http://pear.php.net/manual/en/installation.getting.php#installation.getting.osx ? – Bailey Parker May 05 '13 at 03:27
  • Okay I just checked http://pear.php.net/manual/hu/installation.checking.php Yea PEAR is installed, but when I run the above full code nothing happens. Blank screen. But when I comment out the `$newURL = http_build_url($urlParts);` it does work. So I'm assuming PECL is still not included with PEAR? – jc.yin May 05 '13 at 03:53
  • Then you can use `pecl install pecl_http` in the terminal. – Bailey Parker May 05 '13 at 03:55
  • I get `Cannot install, php_dir for channel "pecl.php.net" is not writeable by the current user` – jc.yin May 05 '13 at 03:59
  • You probably need to use `sudo` – Bailey Parker May 05 '13 at 04:01
  • It gets to `Zend Extension Api No: 220090625` Then `Cannot find autoconf. Please check your autoconf installation and then $PHP_AUTOCONF environment variable. Then, rerun this script. ERROR: phpize failed` – jc.yin May 05 '13 at 04:04
  • http://recensus.com/blog/technical/installing-autoconf-and-fixing-phpize-on-osx-10-8/ (Always google error messages -- chances are someone else has experienced the problem too) – Bailey Parker May 05 '13 at 04:07
  • Thanks for the link, I'm looking at it now. I was just googling the error message and what I came across was people suggesting installing homebrew or macports to deal with this issue. However I'm kinda wary of installing too many third party tools which I'm unfamiliar and might corrupt something later on. Do you use any of those tools? – jc.yin May 05 '13 at 04:13
  • Okay I've installed homebrew, managed to get `sudo pecl install pecl_http` installed. At the end I get this message `install ok: channel://pecl.php.net/pecl_http-1.7.5 configuration option "php_ini" is not set to php.ini location You should add "extension=http.so" to php.ini` I've added `"extension=http.so"` to the top of my php.ini file in MAMP and now I get this error which I haven't found a solution `Fatal error: Call to undefined function http_build_url()` – jc.yin May 05 '13 at 04:35
  • You probably need to restart apache. The PHP.ini isn't loaded on demand by default. – Bailey Parker May 05 '13 at 04:37
  • I've stopped and re-started apache in mamp and same error. I've never modified php.ini before but for adding `"extension=http.so"` to php.ini, should I include the double quotes? And do I need to add the `;` before the line or not? Maybe this is the reason its not working – jc.yin May 05 '13 at 04:41