4

Possible Duplicate:
How to implement a web scraper in PHP?
How to parse and process HTML with PHP?

I need to crawl through a page and get the contents of a particular div. I have php and javascript as my two main options. How can it be done?

Community
  • 1
  • 1
amit
  • 10,133
  • 22
  • 72
  • 121
  • 1
    have you possibly thought of perl and [WWW-Mechanize](http://search.cpan.org/dist/WWW-Mechanize/)? – cctan Feb 01 '12 at 09:05

5 Answers5

3

There are many ways to get the contents of an url:

First Method:

http://simplehtmldom.sourceforge.net/

 Simple HTML DOM Parser

Second Method :

<?php

  $contents = file_get_contents("http://www.url.com");
  $contents = strip_tags($contents, "<div>");
  preg_match_all("/<div/>(?:[^<]*)<\/div>/is", $contents, $file_contents);

?>

Third Method:

`You can use jquery like Selectors :` 

http://api.jquery.com/category/selectors/

Sabari
  • 6,205
  • 1
  • 27
  • 36
2

This is quite a basic method to do it PHP and it returns the content in plain text. However you might consider revising the regex for your particular need.

<?php
  $link = file_get_contents("http://www.domain.com");
  $file = strip_tags($link, "<div>");
  preg_match_all("/<div/>(?:[^<]*)<\/div>/is", $file, $content);
  print_r($content); 
?>
Mike
  • 3,017
  • 1
  • 34
  • 47
2

You can use SimpleDomParser as documented here http://simplehtmldom.sourceforge.net/manual.htm it requires PHP5+ though, but the nice thing is you can find tags on an HTML page with selectors just like jQuery.

jerjer
  • 8,694
  • 30
  • 36
1

Specifically with jQuery, if you have a div like the following:

<div id="cool_div">Some content here</div>

You could use jQuery to get the contents of the div like this:

$('#cool_div').text(); // will return text version of contents...
$('#cool_div').html(); // will return HTML version of contents...

If you're using PHP to generate the content of the page, then you should be able to get a decent handle on the content and manipulate it even before it's returned to the screen and displayed. Hope this helps!

Chris Kempen
  • 9,491
  • 5
  • 40
  • 52
0

Using PHP, you can try the DOMDocument class and the getElements() function

Franquis
  • 743
  • 1
  • 5
  • 17