6

in PHP, I've written a proxy function that accepts a url, user agent, and other settings. Then the function makes a curl request for the website, and prints out that output with proper html content type headers into an iframe (this is necessary only because of my need to change some headers).

That proxied output often has lots of assets with relative URLS and actually inheret the hostname of my site, not the proxied site:

example: [http://MYSITE.com/proxy?url=http://somesite.com] would return the html of [http://somesite.com]

in the response html, there is stuff like this:

<link rel="apple-touch-icon-precomposed" sizes="144x144" href="assets/ico/apple-touch-icon-144-precomposed.png">

The problem:

Instead of the asset looking for that asset at http://somesite.com/assets/ico/apple-touch-icon-144-precomposed.png, it actually tries to find it at http://MYSITE.com/assets/ico/apple-touch-icon-144-precomposed.png which is wrong.

The Question:

What do i need to do to get their relative-path assets to load properly via the proxy?

Kristian
  • 21,204
  • 19
  • 101
  • 176
  • You'd need to search the usual suspects for relative Urls. I would start with the `href` and `src` attributes. But other than those two, I can't think of any more obvious places where they may be found. You might have issues with CSS files which use `url()` with a relative path as this would be more difficult to search for. – Bailey Parker Oct 21 '12 at 03:23
  • you're absolutely right. I've already done my diligence to replace all of the src's and href's and similar. however once the css files are loaded, the contents of those, too, use relative paths. hence the question :/ – Kristian Oct 21 '12 at 03:52

1 Answers1

14

How about the <base> tag? You can place it in the head and it will inform the browser what to use as the base path for all relative URLs on the page:

<head>
    <base href="http://somesite.com/">
</head>

You could add it to each page that you serve with DOMDocument (Note this is for PHP5.4 because of the array dereferencing, but that's easy fixed for earlier versions):

if($contentType == 'text/html') {
    $doc = DOMDocument::loadHTML($html);
    $head = $doc->getElementsByTagName('head')[0];

    if(count($head->getElementsByTagName('base')) == 0) {
        $base = DOMDocument::createElement('base');
        $base->setAttribute('href', $urlOfPageDir);
    }

    $head->appendChild($base);
    echo $doc->saveHTML();
}

Take note that $urlOfPageDir must be the absolute URL of the directory in which the page resides. See this SO question for more on the base tag: Is it recommended to use the <base> html tag?

Community
  • 1
  • 1
Bailey Parker
  • 15,599
  • 5
  • 53
  • 91
  • I was looking into that, but according to what I've read about it, its only for link urls, right? – Kristian Oct 21 '12 at 03:40
  • @Kristian From what I understand it also works on image `src`s, link `href`s, and script `src`s too. – Bailey Parker Oct 21 '12 at 03:52
  • @Kristian I'm not sure about relative URLs in CSS and javascript. Have you tested it out? – Bailey Parker Oct 21 '12 at 03:59
  • EDIT: i was wrong. i didn't realize my base tag was injected after the css assets but before the JS assets. http://stackoverflow.com/questions/2161377/is-the-html-base-tag-also-honored-by-scripting-and-css – Kristian Oct 21 '12 at 04:41
  • @Kristian Yes of course! That makes sense. Good find! Glad you could get everything working! – Bailey Parker Oct 21 '12 at 14:48