0

I'm making a kind of a proxy. My php script downloads a webpage and then it shows the downloaded content. The output doesn't look like the original webpage because some url's needs to be corrected (css, links, images, etc). So i'm looking for a libary that gets all html elements with src and href attributes so that I can change the value. For example:

<link href="/images/favicon.ico">

needs to be changed to

<link href="http://example.com/images/favicon.ico">

What is the best way to do this?

Jochem Gruter
  • 2,813
  • 5
  • 21
  • 43
  • Regular expressions will be your best bet. – Achrome May 19 '13 at 13:15
  • @AshwinMukhija — No, they [really, really wouldn't](http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html). – Quentin May 19 '13 at 13:17
  • That would certainly explain green slimy idols that keep turning up in my house. – Achrome May 19 '13 at 13:23
  • simple_html_dom will do definitely whatever you want to do. – Manish Jangir May 19 '13 at 13:27
  • Solve the problem the quickest and easiest way possible... I have written regex patterns to process html in 15 minutes that have been in place working perfectly fine for 10 years. 10 lines of code instead of an entire external library. There isn't always the need to wrap your head around a library you haven't used if you don't need to. Frankly, parsing the HTML seems like the easiest part here, you still will need to fix the URL's, so I'm not sure why the focus immediately went to the HTML parsing. – Mattt May 19 '13 at 13:35

1 Answers1

0
<?php
require_once('controller/simple_html_dom.php');
$str = '<link rel="stylesheet" type="text/css" href="/css/normalize.css?StillNoSpring"/>
        <script type="text/javascript" src="/js/heyoffline.js?StillNoSpring"></script>';

$html = str_get_html($str);
foreach($html->find('link[rel=stylesheet]') as $styleSheets) {
    echo $styleSheets->getAttribute('href')."<br/>";
}

foreach($html->find('script[type=text/javascript]') as $scripts) {
    echo $scripts->getAttribute('src')."<br/>";
}
?>

you will get the following links

/css/normalize.css?StillNoSpring
/js/heyoffline.js?StillNoSpring
Manish Jangir
  • 505
  • 3
  • 9