0

For a project, I need to fetch a websites content and alter the HTML code. Every link on that website has to be replaced with my own aswell. I used str_replace until I realized that links sometimes have classes assigned to them.

I've tried the preg_replace function to add my own website before every href link that is also between <a> </a> tags. It shouldn't matter whether or not the fetched website in $content contains href="" or href=''.

$content = preg_replace('~(<a\b[^>]*\shref=")([^"]*)(")~igs', '\1http://website.com/fetch.php?url=\2\3', $content);

This does not work and I can't find the error. It should behave as follows:

<a class="link" href="http://google.com">Google</a>

should turn into

<a class="link" href="http://website.com/fetch.php?url=http://google.com">Google</a>

Can someone help me find the error? Thank you in advance.

Mikusch
  • 125
  • 1
  • 8

2 Answers2

0

Don't half-arse a regex that will miss plenty of cases. Just read each document into a DOM tree (give this html5 DOM parser a go), and use XPath to get all links with href attributes, and update them, then save the result.

Walf
  • 8,535
  • 2
  • 44
  • 59
0

just use simplexml and preg_replace

        <?php

            $string= '<a class="link" href="http://google.com">Google</a>';

            $a = new SimpleXMLElement('<a class="link" href="http://google.com">Google</a>');

            $newurl="http://website.com/fetch.php?url=".urlencode($a['href']);

            $pattern = "/(?<=href=(\"|'))[^\"']+(?=(\"|'))/";

            $body = preg_replace($pattern,$newurl,$string);

            echo $body;


         ?>

OUTPUT:

<a class="link" href="http://website.com/fetch.php?url=http%3A%2F%2Fgoogle.com">Google</a>
JYoThI
  • 11,977
  • 1
  • 11
  • 26