2

I want to get the dynamic contents from a particular url:

I have used the code

echo $content=file_get_contents('http://www.punoftheday.com/cgi-bin/arandompun.pl');

I am getting following results:

document.write('"Bakers have a great knead to make bread."

') document.write('© 1996-2007 Pun of the Day.com
')

How can i get the string Bakers have a great knead to make bread. Only string inside first document.write will change, other code will remain constant

Regards,

Pankaj

Felix Kling
  • 795,719
  • 175
  • 1,089
  • 1,143
Pankaj Khurana
  • 3,243
  • 10
  • 50
  • 79
  • possible duplicate of [get url content PHP](http://stackoverflow.com/questions/11363022/get-url-content-php) – T.Todua Apr 24 '15 at 23:17

3 Answers3

7

You are fetching a JavaScript snippet that is supposed to be built in directly into the document, not queried by a script. The code inside is JavaScript.

You could pull out the code using a regular expression, but I would advise against it. First, it's probably not legal to do. Second, the format of the data they serve can change any time, breaking your script.

I think you should take at their RSS feed. You can parse that programmatically way easier than the JavaScript.

Check out this question on how to do that: Best way to parse RSS/Atom feeds with PHP

Community
  • 1
  • 1
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • Hi, Thanks for your reply but i want to know if i use their rss feed how can i pick the pun of the day dynamically (it should not repeat if i click on refresh) – Pankaj Khurana Feb 01 '10 at 11:21
  • I think reading the RSS feed, and displaying only the first item should do the trick! – Pekka Feb 01 '10 at 11:23
  • But i want to display different puns if he click on refresh. – Pankaj Khurana Feb 01 '10 at 11:30
  • That is also easy to achieve by picking a random pun from the RSS feed. It's not possible with the JS solution because it shows only the pun of the day. – Pekka Feb 01 '10 at 11:34
  • I am using this code $xml = simplexml_load_file("http://feeds.feedburner.com/PunOfTheDay"); $items = count($xml->channel->item); echo $rand=rand(0,$items-1); $description = $xml->channel->item[$rand]->description; echo str_replace('[Click to Vote!]','',$description) Is this is correct approach – Pankaj Khurana Feb 01 '10 at 11:46
  • Ya but since the no of puns is less the probability of repetition is very high but thanks for your suggestion – Pankaj Khurana Feb 01 '10 at 12:07
  • Right, I see now! I didn't realize you are fetching a random pun from the Perl script. Sorry! Try talking to them whether there is any way to include more puns to the RSS stream, or serve a random pun through RSS. Otherwise, you may indeed be best off with @Luca Matteis' suggestion. – Pekka Feb 01 '10 at 12:09
4

1) several local methods

<?php
echo readfile("http://example.com/");            //needs "Allow_url_include" enabled
echo include("http://example.com/");             //needs "Allow_url_include" enabled
echo file_get_contents("http://example.com/");   
echo stream_get_contents(fopen('http://example.com/', "rb")); //you may use "r" instead of "rb"  //needs "Allow_url_fopen" enabled
?> 

2) Better Way is CURL:

echo get_remote_data('http://example.com');                                // GET request 
echo get_remote_data('http://example.com', "var2=something&var3=blabla" ); // POST request


//============= https://github.com/tazotodua/useful-php-scripts/ ===========
function get_remote_data($url, $post_paramtrs=false)    {   $c = curl_init();curl_setopt($c, CURLOPT_URL, $url);curl_setopt($c, CURLOPT_RETURNTRANSFER, 1); if($post_paramtrs){curl_setopt($c, CURLOPT_POST,TRUE);  curl_setopt($c, CURLOPT_POSTFIELDS, "var1=bla&".$post_paramtrs );}  curl_setopt($c, CURLOPT_SSL_VERIFYHOST,false);curl_setopt($c, CURLOPT_SSL_VERIFYPEER,false);curl_setopt($c, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:33.0) Gecko/20100101 Firefox/33.0"); curl_setopt($c, CURLOPT_COOKIE, 'CookieName1=Value;'); curl_setopt($c, CURLOPT_MAXREDIRS, 10);  $follow_allowed= ( ini_get('open_basedir') || ini_get('safe_mode')) ? false:true;  if ($follow_allowed){curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);}curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 9);curl_setopt($c, CURLOPT_REFERER, $url);curl_setopt($c, CURLOPT_TIMEOUT, 60);curl_setopt($c, CURLOPT_AUTOREFERER, true);         curl_setopt($c, CURLOPT_ENCODING, 'gzip,deflate');$data=curl_exec($c);$status=curl_getinfo($c);curl_close($c);preg_match('/(http(|s)):\/\/(.*?)\/(.*\/|)/si',  $status['url'],$link);$data=preg_replace('/(src|href|action)=(\'|\")((?!(http|https|javascript:|\/\/|\/)).*?)(\'|\")/si','$1=$2'.$link[0].'$3$4$5', $data);$data=preg_replace('/(src|href|action)=(\'|\")((?!(http|https|javascript:|\/\/)).*?)(\'|\")/si','$1=$2'.$link[1].'://'.$link[3].'$3$4$5', $data);if($status['http_code']==200) {return $data;} elseif($status['http_code']==301 || $status['http_code']==302) { if (!$follow_allowed){if(empty($redirURL)){if(!empty($status['redirect_url'])){$redirURL=$status['redirect_url'];}}   if(empty($redirURL)){preg_match('/(Location:|URI:)(.*?)(\r|\n)/si', $data, $m);if (!empty($m[2])){ $redirURL=$m[2]; } } if(empty($redirURL)){preg_match('/href\=\"(.*?)\"(.*?)here\<\/a\>/si',$data,$m); if (!empty($m[1])){ $redirURL=$m[1]; } }   if(!empty($redirURL)){$t=debug_backtrace(); return call_user_func( $t[0]["function"], trim($redirURL), $post_paramtrs);}}} return "ERRORCODE22 with $url!!<br/>Last status codes<b/>:".json_encode($status)."<br/><br/>Last data got<br/>:$data";}

NOTICE: It automatically handles FOLLOWLOCATION problem + Remote urls are automatically re-corrected! ( src="./imageblabla.png" --------> src="http://example.com/path/imageblabla.png" )

p.s.on GNU/Linux distro servers, you might need to install the php5-curl package to use it.

T.Todua
  • 53,146
  • 19
  • 236
  • 237
2

Pekka's answer is probably the best way of doing this. But anyway here's the regex you might want to use in case you find yourself doing something like this, and can't rely on RSS feeds etc.

document\.write\('      // start tag
([^)]*)                 // the data to match
'\)                     // end tag

EDIT for example:

<?php
$subject = "document.write('&quot;Paying for college is often a matter of in-tuition.&quot;<br />')\ndocument.write('<i>&copy; 1996-2007 <a target=\"_blank\" href=\"http://www.punoftheday.com\">Pun of the Day.com</a></i><br />')";
$pattern = "/document\.write\('([^)]*)'\)/";
preg_match($pattern, $subject, $matches);
print_r($matches);
?>
Luca Matteis
  • 29,161
  • 19
  • 114
  • 169