0

Im trying to "get" text from another website, and publish it on mine, so that when the other website updates the text inside a "div" or other object, my website will update aswell.

Can this be done in php? And if so, how?

user3259244
  • 15
  • 1
  • 7
  • you can but you need to run an script in crontab . – ImadOS Feb 12 '14 at 22:48
  • 1
    you have permission from other site to do this? –  Feb 12 '14 at 22:51
  • You will need something like curl in PHP to make HTTP requests to other sites. See this SO for more info on curl: http://stackoverflow.com/questions/3062324/what-is-curl-in-php You can choose to check the other site every time your page is loaded, or you can as ImadOS suggests run a cron job (or scheduled task in Windows). – Etzeitet Feb 12 '14 at 22:51

2 Answers2

3

php has inbuilt function file_get_contents to do this

$html=file_get_contents("http://www.website.com")

However this isn't particularly helpful and you can't set a timeout on the request, so heres a quick function using curl:

function getHTML($url,$timeout)
{
       $gs = curl_init($url); // initialize curl with given url
       curl_setopt($gs, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set useragent
       curl_setopt($gs, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
       curl_setopt($gs, CURLOPT_FOLLOWLOCATION, true); // follow redirects
       curl_setopt($gs, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds
       curl_setopt($gs, CURLOPT_FAILONERROR, 1); // stop if an error is encountered
       return @curl_exec($gs);
}

Then you can just use a Regular Expression to get the data you want, e.g.

preg_match("/<title>(.*)<\/title>/i", $html, $match);
$pagetitle = $match[1];

EDIT:

In response to the comment below regarding Regex, I suggest you checkout the following Stack Overflow question and answer:

This one!

As the PHP Document Object Model may well be what you're looking for.

Community
  • 1
  • 1
Compy
  • 1,157
  • 1
  • 12
  • 24
  • dont parse html with regular expressions - it leads to madness –  Feb 12 '14 at 22:57
  • See updated answer. Were you thinking of the PHP Document Object Model or do you have another suggestion? – Compy Feb 12 '14 at 22:59
  • yup DOM, not always best option, but generally better than a regular expression –  Feb 12 '14 at 23:00
  • Fair enough. I guess it depends on the size of the page you're parsing. Do you know if PHP DOM is any faster then REGEX? – Compy Feb 12 '14 at 23:02
  • 2
    In good fun, he was referring to this one: http://stackoverflow.com/a/1732454/457836 Best answer on stack overflow ever. – Eric G Feb 12 '14 at 23:02
  • only the very special people get it when i say "the pony he comes" –  Feb 12 '14 at 23:05
  • Okay, whaf if i need to take "1" text inside that? – user3259244 Feb 12 '14 at 23:05
  • If you take a look at the other question I linked to, the answer there is pretty specific as to how you descend the DOM tree to get the item you want. Feel free to update your question with a precise description of the HTML and I'll update my answer appropriately. – Compy Feb 12 '14 at 23:09
0

What about this:

 <?php
 function getHTMLData($url , $query){
     $data = simplexml_load_file($url);
     $result = $data->$query;
 }

Remember HTML is from XML which is Parsed by browsers using there tags

Uyghur Lives Matter
  • 18,820
  • 42
  • 108
  • 144