0

If you visit this page in the browser: http://www.x-rates.com/d/TRY/table.html you can see that it works fine, but when I try to do $doc = new DOMDocument(); $doc->loadHTMLFile('http://www.x-rates.com/d/TRY/table.html'); it returns 404. I have also tried doing file_get_contents() and sending the html to DOMDocument that way, but no luck. Any help gratefully received.

Liam Bailey
  • 5,879
  • 3
  • 34
  • 46

2 Answers2

4

404 looks like the standard response code you've given for the URL:

$ curl -I http://www.x-rates.com/d/TRY/table.html
HTTP/1.1 404 Not Found
Date: Mon, 01 Aug 2011 12:23:49 GMT
Server: Apache/2.2.19
Content-Type: text/html

You can acquire the HTTP response body and load it with DomDocument as a string.

This can be done with file_get_contentsDocs and setting the ignore_errors HTTP context option. Example code:

$url = 'http://www.x-rates.com/d/TRY/table.html';

// Create a stream
$opts = array(
  'http'=>array(
    'ignore_errors'=> true,
  )
);

$context = stream_context_create($opts);

// Open the file using the HTTP headers set above
$file = file_get_contents($url, false, $context);

$doc = new DOMDocument();
$doc->loadHTML($file);
hakre
  • 193,403
  • 52
  • 435
  • 836
0

The page is returning a 404, and I believe it is doing this deliberately to make it harder to scrape it. I found this on their site:

Fetching data with tools such as PHP, LWP, Java and Microsoft controls for example are not permitted

You might want to double-check that you are actually allowed to be doing what you are doing, I'm concerned you're potentially infringing copyright.

ZoFreX
  • 8,812
  • 5
  • 31
  • 51
  • Where did you see this statement, I found nothing like that on this particular site, as I looked for it beforehand. – Liam Bailey Aug 01 '11 at 14:02
  • This limitation is implied by the terms of use and copyright page http://www.x-rates.com/copyright.html but I found that exact sentence on the developers page http://www.x-rates.com/developers.html – ZoFreX Aug 01 '11 at 16:20