403 error when using file_get_contents()

Question

I'm trying to scrape some product details from a website using the following code:

$list_url = "http://www.topshop.com/en/tsuk/category/sale-offers-436/sale-799";
$html = file_get_contents($list_url);
echo $html;

However, I'm getting this error:

Warning: file_get_contents(http://www.topshop.com/en/tsuk/category/sale-offers-436/sale-799) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in /homepages/19/d361310357/htdocs/shopaholic/rss/topshop_f_uk.php on line 123

I gather that this is some sort of block by the website to prevent scraping. Is there a way around this - perhaps using cURL and setting a user agent?

If not, is there another way of getting basic product data like item name and price?

EDIT

The context of my code is that I'd eventually still want to be able to achieve the following:

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

score 1 · Accepted Answer · edited May 23 '17 at 11:49

1

I've managed to fix it by adding the following code...

ini_set('user_agent','Mozilla/4.0 (compatible; MSIE 6.0)');

...as per this answer.

edited May 23 '17 at 11:49

Community

1
1

answered Mar 29 '14 at 17:28

Sebastian

3,548
18
60
95

score -1 · Answer 2 · answered Mar 29 '14 at 17:21

-1

You should use cURL , not the simple way with file_get_contents().
Use cURL and set up the proper http headers to mimic a proper http request (a real request).

P.S. : set up cURL to follow redirects . Here is the link to cURL

answered Mar 29 '14 at 17:21

Geo C.

755
6
18

403 error when using file_get_contents()

2 Answers2