0

I'm just completely lost now, here's the URL example:

file_get_contents('http://adam-wennick.squarespace.com/actor-bro-show?format=rss');

Of course this works just fine with any other url... but this one, although it loads just fine in the browser, it returns 400 for both file_get_contents and for simplexml_load_file, while it returns 200 for curl, but the object is NULL. Has anyone of you ever encountered anything like this before?

curl code:

$rss = 'http://adam-wennick.squarespace.com/actor-bro-show?format=rss'; 
$ch = curl_init(); 
curl_setopt($ch,CURLOPT_URL, $rss); 
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); 
$output = curl_exec($ch);
Nigel Ren
  • 56,122
  • 11
  • 43
  • 55
  • 2
    So it probably needs a stream context to make it look more like a browser access. Show us the curl that works that should identify what you need to add to the stream. Or just use cURL – RiggsFolly Mar 14 '19 at 14:49
  • It probably has some scraper protection on it. One of the easiest things to try is adding a user_agent to the curl headers. file_get_contents and simplexml_load_file will not work in this context. – aynber Mar 14 '19 at 14:49
  • That's exactly the case, the 200 is returned when I add user agent, but the output is still NULL so I'm a bit confused, here's the curl: `$rss = 'http://adam-wennick.squarespace.com/actor-bro-show?format=rss'; $ch = curl_init(); curl_setopt($ch,CURLOPT_URL, $rss); curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); $output = curl_exec($ch);` – Tomasz Mieczkowski Mar 14 '19 at 14:51
  • 1
    You can set the user agent if you use, the before mentioned, [stream context](http://php.net/manual/en/function.stream-context-create.php) with `file_get_contents()`. However, I would just stick to cURL for this. – M. Eriksson Mar 14 '19 at 14:53
  • Thanks a lot everyone - @MagnusEriksson reply helped. I suppose cURL would be best, but I just can't get it to work. However adding context to file_get_contents did return the string. I'll post a full answer below as I don't know how to style comments here yet. :) – Tomasz Mieczkowski Mar 14 '19 at 15:03
  • 1
    Since the question about passing request headers with file_get_contents() already have been asked and answered here, my opinion is that it should be marked as a duplicate instead. – M. Eriksson Mar 14 '19 at 15:09

2 Answers2

1
<?php

$ch = curl_init("http://adam-wennick.squarespace.com/actor-bro-show?format=rss");

curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0');

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$result = curl_exec($ch);

print_r($result);

curl_close($ch);

The output is the content of the url

pr1nc3
  • 8,108
  • 3
  • 23
  • 36
  • Okay, I'm officially a dum-dum. The reason why I was getting **200** AND **NULL** is because I missed the line that said `$output = json_decode($output);` and since it's not a json output it gave me a NULL string. Yes. So the solution was pretty much adding a user agent to either one cURL or file_get_contents. Thanks again! – Tomasz Mieczkowski Mar 14 '19 at 15:20
  • Don't forget to mark the answer as accepted if it helped you solve your question. Glad if i helped. – pr1nc3 Mar 14 '19 at 15:21
0

In case others stumble upon here - as @aynber mentioned, this URL is using some sort of scrape protection, even though it's RSS it's supposed to be scraped. :) Come on Squarespace!

As @MagnusEriksson suggested, I used file_get_contents with stream context and then replaced xml_load_file with xml_load_string:

$rss = 'http://adam-wennick.squarespace.com/actor-bro-show?format=rss';

$opts = array(
    'http'=> array(
        'method'=>   "GET",
        'user_agent'=>    'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'
      )
);

$context = stream_context_create($opts);
$result = file_get_contents($rss, NULL, $context);
$output = simplexml_load_string($result);

That did the trick and the $output now has the XML object. Thanks again to everyone who replied so quickly.